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1  Introduction 

Analyses  of  time  to  disease  onset  and,  separately,  time  to  death  in  cohort 
studies  are  well  established;  see,  for  example,  Lee  (1980).  Analyses  that  ac¬ 
count  for  the  natural  progression  of  disease  and  death  are  lacking,  however. 
For  example,  a  standard  device  is  to  ignore  time  and  score  the  occurrence 
of  disease  or  death  with  or  from  the  disease  as  a  failure.  To  account  for 
time,  investigators  sometimes  use  the  first  appearance  of  the  disease,  either 
at  diagnosis  or  at  death.  Neither  of  these  approaches  explicitly  distinguishes 
chronic  from  rapidly  lethal  diseases  or  the  effect  of  the  treatment  on  lethality. 
To  that  end,  we  consider  the  nonparametric  estimation  of  the  joint  distri¬ 
bution  of  the  time-to-onset  of  a  specific  disease  and  time-to-death  in  cohort 
studies. 

Time-to-onset  and  time-to-death  data  are  subject  to  many  possible  pat¬ 
terns  of  right  and  left  censoring  on  the  pairs  of  times-to-event.  As  pointed 
out  by  Kaplan  and  Meier  (1958),  the  nonparametric  approach  to  distribution 
estimation  requires  the  discretizing  of  continuous  data.  In  the  case  of  the  two 
dimensional  time-to-event  vectors  considered  in  this  paper,  discretizing  the 
data  means  partitioning  the  two  dimensional  space  determined  by  time-to- 
onset  and  time-to-death  into  rectangles.  A  joint  probability  estimate  will 
then  be  an  estimate  of  the  probability  of  a  particular  rectangle.  The  case  in 
which  death  occurs  before  the  onset  of  disease  is  also  covered  by  our  methods. 

Let  T  denote  the  time-to-onset  of  the  disease  onset  and  D  denote  the 
time-to-death.  These  times  may  be  measured  from  birth  or  some  other  point 
in  time,  such  as  the  beginning  of  a  period  of  exposure  to  an  environmental 
contaminant.  Our  goal  is  to  estimate  the  probability  distribution  of  the  pair 
(T,D).  The  basic  method  for  nonparametric  estimation  of  this  distribution  is 
to  divide  the  first  quadrant  of  the  two  dimensional  (T,D)  plane  into  rectan¬ 
gles  . . . ,  Rij  and  estimate  the  probability  that  (T,D)  will  fall  in  Rij  by 
the  sample  proportion  of  observed  (T,D)  pairs  that  occurred  in  Rij.  Deaths 
without  disease  are  included  by  augmenting  the  set  of  two  dimensional  rect¬ 
angles  by  the  set  of  intervals  Ri,...,R_j  on  the  positive  real  line,  which 
are  interpreted  as  intervals  for  death  times  of  individuals  who  died  without 
the  disease.  The  probability  estimates  are  the  sample  proportion  of  those 
individuals  who  died  without  the  disease  and  whose  death  times  fell  into  the 
interval  R,j.  j  =  1, . . . ,  J. 
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Morbidity /mortality  data  are  subject  to  a  high  degree  of  censoring  on 
one  or  both  of  the  members  of  the  pair  (T,D)  because  the  disease  may  not 
be  rapidly  lethal,  death  can  occur  prior  to  the  onset  of  disease  and  because 
many  studies  are  analyzed  before  all  of  the  subjects  have  died.  Both  T  and 
D  may  be  subject  to  left  as  well  as  right  censoring.  For  example,  the  disease 
may  have  been  present  at  the  time  that  the  subject  entered  the  study,  in 
which  case  T  would  be  left  censored. 

We  use  the  iterative  Estimation  Maximization  (EM)  algorithm  of  Demp¬ 
ster,  Laird  and  Rubin  (1977)  to  find  the  maximum  liklihood  estimate  of 
the  joint  distribution  of  (T,D).  At  any  iteration,  this  algorithm  distributes 
censored  observations  among  the  rectangles  that  are  consistent  with  the  ob¬ 
served  censoring  pattern  according  to  a  current  set  of  estimated  probabilities. 
When  all  observations  have  been  distributed  in  this  way,  each  rectangle  will 
have  a  set  of  ’’pseudo-counts”  assigned  to  it.  Pseudo-proportions  are  calcu¬ 
lated  using  the  pseudo-counts.  The  pseudo-proportions  are  used  in  the  next 
iteration  as  a  new  set  of  estimated  probabilities  to  redistribute  the  censored 
observations  producing  another  set  of  pseudo-counts.  The  iterative  process 
continues  until  the  sequence  of  estimated  probabilities  converges.  We  dis¬ 
tribute  each  observation  separately  among  its  potential  rectangles  because 
the  final  set  of  distributions  for  each  observation  is  needed  to  apply  the 
method  of  Louis  (1982)  for  estimating  the  covariance  matrix  of  the  vector  of 
probability  estimates  produced  by  the  EM  algorithm. 

2  Notation 

The  first  quadrant  of  two  dimensional  (T,D)  space  is  partitioned  by  rectangles 
Rij,  z  =  1, . . . ,  /  and  j  =  1, . . . ,  J,  where  the  boundaries  of  the  rectangles  Rij 
are  defined  by  the  points  £  =  1,. ,  max{I,  J),  according  to 

Rij  =  {(t,d):  Ti  <  t  <  Ti+i,  Tj  <d<  Tj+i}, 

where  rj+i  and  rj+i  are  defined  to  be  oo.  We  also  define 

R.j  =  {(t,d):  t  >  d  and  rj  <t  <  Tj+i}, 

which  are  the  rectangles  that  account  for  deaths  without  disease.  The  re¬ 
sulting  probability  distribution  is  defined  as 

Pij  =  -P((T,D)  eRij)  ,z  =  1, . . . j  =  1, . . . ,  J, 

and 

P-j  =  ■P((T,D)  eR,j)  = 

Using  the  same  set  of  endpoints  for  both  disease  and  death  allows  the  deter¬ 
mination  of  the  potential  intervals  for  a  censored  observation. 
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The  censoring  patterns  can  be  described  in  the  following  way.  For  a  time- 
to-event  random  variable  X,  denote  two  independent  censoring  variables  as 
cXL  and  cXR.  X  is  left  censored  and  cXL  is  observed  if  X  <  cXL.  X  is 
right  censored  and  cXR  is  observed  if  X  >  cXR.  X  is  interval  censored  if 
cXR  <  X  <  cXL,  in  which  case  the  information  about  X  is  that  it  lies  in  the 
interval  (cXR,  cXL).  Thus,  four  censoring  indicators,  two  for  T  and  two  for 
D,  are  required.  These  are  defined  as 

iTL  =  1  if  T  is  left  censored,  0  otherwise 

iTR  =  1  if  T  is  right  censored,  0  otherwise 

iDL  ==  1  if  D  is  left  censored,  0  otherwise 

iDR  =  1  if  D  is  right  censored, 0  otherwise. 


3  The  Discrete  Likelihood 

In  the  discrete  model  with  no  censoring,  one  observes  a  frequency  for  each  of 
(I+1)J  rectangles,  Rij  and  R,j,  i  =  1, . . . ,  /  and  j  =  1, . . . ,  J.  When  censor¬ 
ing  is  present,  each  observation  gives  information  about  (T,D)  via  its  censor¬ 
ing  pattern,  given  by  the  vector  of  censoring  indicators, 
(iTL,  iTR,  iDL,  iDR),  and  the  observed  values  of  T  and  D  or  their  censoring 
variables.  This  information  indicates  which  of  the  Rij  or  Rj  are  possible 
for  the  actual  value  of  the  vector  (T,D).  There  are  2^  censoring  patterns  for 
cases  for  which  it  is  known  that  disease  occurred  before  death.  There  are 
four  censoring  patterns  for  cases  for  which  it  is  known  that  death  occurred 
before  disease  and  there  are  four  censoring  patterns  for  cases  is  which  noth¬ 
ing  is  known  about  the  time-to-onset  of  disease.  Table  1  lists  the  censoring 
patterns  and  each  pattern’s  contribution  to  the  discrete  likelihood  for  the 
cases  for  which  something  is  known  about  the  time-to-onset  of  disease.  Ta¬ 
ble  2  lists  the  censoring  patterns  and  likelihood  contibutions  for  the  cases  for 
which  nothing  is  known  about  the  time-to-onset  of  disease.  In  Table  1,  the 
expression  T=  indicates  that  <  T  <  Similar  notation  is  used  for 
D  in  both  Tables  1  and  2.  In  general,  T  and  D  are  considered  discrete  with 
possible  values  n,...,  r^ax(I,J)- 
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Table  1 


Censoring  Patterns  and  Likelihood  Contributions 


Censoring 

Pattern 

(iTL,iTR,iDL,iDR) 

T 

D 

Likelihood 

Contribution 

1 

(0,0, 0,0) 

II 

D=  Ts 

Pus 

2 

(1,0,0,0) 

T< 

D=  Ts 

^Pqs 

9=1 

min{I,s) 

3 

(0, 1,0,0) 

T> 

D=  Ts 

'll,  Pqs 

q=u 

4 

(0,0, 1,0) 

T=  r„ 

T'li  <  D  < 

s 

'Z,Py^ 

i=u 

5 

(0,0,0,1) 

<  D 

J 

e-s 

6 

(1,1,0,0) 

rn  <  T  <  r. 

D=  Ts 

U 

Pqs 

q=L 

7 

(1,0, 1,0) 

T<  ru 

T  <  D  < 

u  s 

'EiZPqi 

q=l  t=q 
u  J 

8 

(1, 0,0,1) 

T< 

T<  <  D 

EEPqe 

q=l e=s 
min{I,s)  s 

9 

(0,1,1,0) 

T>  Tu 

T<  D  < 

E  llPqe 

q=u  e=q 
s  J 

10 

(0,1,0, 1) 

T> 

T<  <  D 

il'llPqi 

g=u£=s  • 

11 

(0,0, 1,1) 

T= 

D  - 

JlPue 

£=L 
u  s 

12 

(1, 1,1,0) 

<  T  < 

T<  D  <  rs 

llilPqe 

q^L  l=q 
u  J 

13 

(1,1, 0,1) 

<  T  < 

<  ’"s  <  D 

llilPqe 

q=L  i=s 
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Table  1  (Continued) 

Censoring  Likelihood 


Pattern 

(iTL,iTR,iDL,iDR) 

T 

D 

Contribution 

14 

(1,0,1, 1) 

T<  Vu 

ru<rL<'D<rs 

U  S 

'EY!  Pie 

q=l  £=L 
s  $ 

15 

(0,1,1, 1) 

I'm  <  T 

max(T,r£,)  <  D  <  Ts 

E  E  Pie 

fcmax(g,L) 

'  16 

(1,1, 1,1) 

<  T  < 

r.<r,<D<r. 

EE  Pie 

q=L  t=t 

17 

(.,.,0,0) 

no  T 

II 

Q 

P.S 

18 

(.,.,1,0) 

no  T 

D<  Ts 

EP.e 

t=\ 

j  I  j 

19 

(.,.,0,1) 

no  T 

<  D 

EP-e^EEPie 

I-S  q=siz=zq 

s  $  s 

20 

(.,.,1,1) 

no  T 

ri  <'D  <rs 

EP-e+EEPie 

£z=L  q—L  £=q 

In  censoring  patterns  17  through  20,  the  disease  is  known  not  to  have 
occurred  before  death  or  before  the  observed  censoring  time  or  times  for 
death.  In  pattern  17,  the  death  time  is  uncensored  and  it  is  known  that 
the  subject  died  without  the  disease.  In  pattern  18,  death  is  left  censored  so 
that  the  only  information  about  time-to-onset  of  the  disease  is  that  it  did  not 
occur  before  the  left  censoring  value  of  time-to-death.  In  pattern  19,  time- 
to-death  is  right  censored  so  that  it  is  only  known  that  the  onset  of  disease 
did  not  occur  before  this  censoring  point,  but  may  have  occurred  between 
the  right  censoring  time  for  time-to-death  and  the  actual  time-to-death.  In 
pattern  20,  the  time-to-death  is  interval  censored  and  it  is  assumed  that  the 
time-to-onset  did  not  occur  before  the  left  endpoint  of  the  interval  censoring 
time-to-death. 

Table  1  lists  censoring  patterns,  for  which  at  least  some  information  is 
known  about  both  time-to-onset  of  disease  and  time-to-death.  It  is  possible 
that  only  time-to-death  is  known  and  that  no  information  regarding  disease 
history  is  available.  In  this  case,  technically,  time-to-onset  of  disease  is  right 
censored  at  time  zero.  There  are  four  possible  patterns  for  this  situation,  cor¬ 
responding  to  the  four  possible  censoring  patterns  for  time-to-death.  Table  2 
lists  these  four  possible  censoring  patterns  and  their  likelihood  contributions. 
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Table  2 

Censoring  Patterns  and  Likelihood  Contributions 


For  Subjects  with  Time-to-Death  Information  Only 


Censoring 

Pattern 

(iTL,iTR,iDL,iDR) 

T 

D 

Likelihood 

Contribution 

min{I,s) 

21 

(0,1, 0,0) 

T>0 

D=  rs 

S  Pqs+P.s 
g=0 

min{I,s)  s  s 

22 

(0,1, 1,0) 

T>0 

D  <rs 

12  12Pqi-^12P-i 

9=0  e=q  £=1 

I  J  I  J 

23 

(0,1,0,1) 

T>0 

J's  <  D 

E  E  ft^+EEp.< 

g=0  i=max(q,s)  £=g 

24 

(0,1, 1,1) 

T>0 

n,  <  D  <  r* 

E  E  p,.  +  Ep.^ 

9-0f=max(9,L) 

4  The  EM  Algorithm 

Each  likelihood  contribution  listed  in  Tables  1  and  2  is  the  sum  of  all  of 
the  probabilities  of  rectangles  that  are  consistent  with  the  censoring  pattern. 
The  EM  algorithm  distributes  the  censored  observations  as  pseudo-counts 
among  the  rectangles  that  are  consistent  with  the  censoring  pattern.  The 
distribution  of  censored  observations  is  defined  by  the  conditional  probabil¬ 
ity  that  the  actual  value  of  (T,D)  would  occur  in  the  rectangle  given  the 
censoring  pattern.  This  conditional  probability  is  the  ratio  of  the  uncondi¬ 
tional  probability  of  (T,D)  occurring  in  the  rectangle  divided  by  the  sum  of 
probabilities  for  all  rectangles  consistent  with  the  censoring  pattern.  This 
ratio  is  simply  the  unconditional  rectangle  probability,  pij  or  p.j,  divided  by 
the  likelihood  contribution  as  listed  in  Tables  1  and  2.  For  example,  an  obser¬ 
vation  with  censoring  pattern  1  (Table  1)  is  completely  uncensored  and  thus 
contributes  0  to  all  rectangles  except  for  Rus,  which  is  allocated  a  1.  Letting 
Pi  indicate  the  likelihood  contribution  of  an  observation  of  censoring  pattern 
i,  an  observation  with  censoring  pattern  2  contributes  Pij/P2  to  rectangles 
Rij,  where  i  =  1, . . . ,  u  and  j  =  s  and  allocates  0  to  all  other  rectangles. 
Each  censoring  pattern  defines  a  set  of  possible  rectangles  in  which  the  pair 
(T,D)  could  have  actually  occurred.  The  EM  algorithm  begins  with  a  set  of 
rectangle  probabilities,  and  =  1, . . . ,  7  and  j  =  1, . . . ,  J.  The  sum 
of  all  data  allocations  for  each  rectangle  form  the  first  set  of  pseudo-counts, 
c]j  and  cb,z  =  1, . . . ,  I  and  y  =  1, . . . ,  J,  for  the  rectangles.  In  the  next  it¬ 
eration,  a  new  set  of  rectangle  probabilities,  pL  and  z  =  1, . . . ,  I  and  j  = 
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1, . . . ,  J,  are  estimated  by  the  sample  proportions  of  the  pseudo-counts. 
That  is,  according  to 

P\j  =  2  =  1, . . . ,  /  and  j  =  1, . . . ,  J, 

P^.j  =  j  =  1,  •  •  • , 

where  n  is  the  total  sample  size.  In  the  following  iteration,  the  new  set  of 
probabilities  is  used  to  reallocate  each  observation  to  form  another  set  of 
pseudo-counts.  The  process  continues  until  the  rectangle  probabilities  have 
converged. 


5  The  Information  Matrix 

Louis  (1982)  presented  a  method  for  estimating  the  information  matrix  of  a 
vector  of  parameter  estimates  obtained  using  the  EM  algorthm.  The  applica¬ 
tion  of  that  method  to  the  rectangle  probability  estimates  is  straightforward. 
Each  observation  can  be  represented  as  a  vector  of  indicator  functions 

{X.i,  .  .  .  ,Xj,Xii,Xi2,  . . .  ,Xij), 

where,  in  the  complete  data  case,  one  and  only  one  of  the  components  is  1 
and  the  others  are  0.  The  probability  distribution  for  each  observation  can 
be  written  as  a  vector  of  multinomial  probabilities 

{P.i,  ■  ■  ■  ,P.j,Pi\,Pn,  ■  ■  ■  ^Pij)- 

The  lower  triangular  portion  of  the  submatrix  formed  by  eliminating  the 
death-without-disease  row  must  have  zero  probabilities,  because  these  rect¬ 
angles  represent  the  impossibility  of  disease  occurring  after  death.  Eliminat¬ 
ing  these  impossible  rectangles  from  the  observation  and  parameter  vectors 
and  renumbering  the  remaining  rectangles  across  rows  and  down  columns 
(the  method  that  SAS^^  uses  for  numbering  two  dimensional  arrays),  we 
can  write  the  observation  and  parameter  vectors  as, 

x=  {xx,...,xk) 

and 

P  =  (Pi,---,Pk), 

where  K  =  (/  -I- 1)  J  —  (/  —  1)/ /2,  the  number  of  possible  rectangles.  Letting 

/(xjp)  =  /(xbi,...,pK-i), 


8 


the  probability  distribution  function  for  the  observation  vector  can  be  written 
as, 

K-l  K-1 

/(x|p)  =  XkPk  -  Xk(1  -  Pk)- 

k—1  k=l 

Following  Louis  (1982),  let 

For  our  multinomial  case  the  value  of  5(x,  p)  is 


'5(x,p)  =  (—  - 

P\ 


Pk' 


^'K-l  _ 
Pk-\  Pk 


Pj^  =  \-  Ef=/pfc- 

An  estimate  of  the  information  matrix  for  the  estimates  of  (pi, . . .  ,Pk-i) 
obtained  by  the  EM  algorithm  is, 

I/r-i  =  ^'5^(xr,p)5(xr,p), 

-  r=l 

where  Xr  is  the  rth  pseudo-data  observation  from  the  results  of  the  converged 
EM  algorithm. 

6  Epidemiological  Measures 

Given  a  large  sample,  the  vector  of  estimates 

P  = 

is  approximately  multivariate  normally  distributed  with  covariance  matrix 
Therefore,  the  delta  method  (Bishop,  Feinberg  and  Holland  (1975), 
page  486)  can  be  applied  to  obtain  the  asymptotic  distribution  of  any  differ¬ 
entiable  function  of  p. 

For  convenience,  we  compute  the  singular  covariance  matrix  of  the  en¬ 
tire  vector  of  estimates  {pi,  ■  ■  ■  ,Pk)-  This  covariance  matrix,  Ek,  is 
augmented  by  the  following  Kth  row  and  Kth  column. 


K-l  K-l  K-lK-l 

Kth  row  =  (-  (Tkl,  .  .  .  ,  -  XI  CTfel:  ^ks), 

k=l  fc=l  k=l  s=l 

where  aks  is  the  {k,  s)  element  of  The  Kth  column  is  the  transpose  of 
the  Kth  row. 
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The  morbidity /mortality  experience  of  two  groups  can  be  compared  by 
testing  the  equality  of  their  joint  probabihty  distributions,  assuming  the  same 
set  of  rectangle  boundaries  for  both  groups.  Let  the  vectors  of  estimates  be 
given  by  pi  and  p2  for  each  group  and  the  covariance  matrices  be  given  by 
and  Tj2,k-  Then  the  statistic 

(Pi  ~  P2)(L)i,a-  +  ^2,k)  (Pi  ~  P2)^, 

where  the  exponent  indicates  the  generalized  inverse,  has  approximately 
a  chi-square  distribution  with  K  —  1  degrees  of  freedom  under  the  null  hy¬ 
pothesis  that  the  two  joint  distributions  are  the  same. 

Relative  Risk: 

Given  two  groups,  say  exposed  (group  1)  and  controls  (group  2),  four 
relative  risk  measures  can  be  estimated.  The  relative  risk  of  dying  with 
disease  in  interval  j  after  contracting  the  disease  in  interval  i  is  defined  by 

RRij  Px,ij / P2,ij ’>  *  1 )  •  •  •  )  j  •  ■  •  ) 

where  pi^ij  and  p2,ij  si's  the  probabilities  for  the  rectangle  FUj  for  group  1  and 
group  2.  The  relative  risk  of  dying  without  disease  in  interval  j  is  defined  by 

RR.i  =  i  = 

where  pij  and  p2,j  are  the  jth  probabilities  in  the  vector  for  groups  1  and  2, 
respectively.  The  relative  risk  of  dying  with  disease  in  interval  j  is 

i  3 

RR'i  =  (5ZPl,ii)/(I^P2.ij),  j  =  1, . . . ,  J, 

t=l  i=l 

and  the  overall  relative  risk  of  dying  with  disease  is 

(llY.Phi3)l(^Y.P2,i3)- 

i—l  j=l  i=l  j=l 

Estimates  of  these  relative  risks  are  ratios  of  single  probability  estimates  or 
sums  of  probability  estimates.  Assuming  the  samples  from  groups  1  and  2 
are  independent,  the  numerators  and  denominators  are  independent.  The 
variance  of  a  sum  of  estimators  is  the  sum  of  all  the  entries  in  the  covariance 
matrix  corresponding  to  the  summands.  These  variances  can  be  calculated 
given  the  covariance  matrix  for  the  joint  probability  estimates  from  both 
groups.  Writing  the  relative  risk  (RR)  as  RR  =  A/ B  and  letting  ai  and  a2 
be  the  variances  of  A  and  B  and  ai2  the  covariance  of  A  and  B,  the  variance 
of  the  relative  risk  is 

O-RR  =  ~  ^^12^  +  0-2{  —  )'^). 
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Lethality: 

Lethality  is  defined  as  the  ratio  of  the  probability  of  dying  with  the  disease 
to  the  probability  of  dying  without  the  disease.  The  lethality  specific  to  a 
time-to-death  interval  j  is  the  ratio  of  the  probability  of  dying  in  interval 
j  with  disease  to  the  probability  in  interval  j  without  the  disease.  With 
respect  to  the  matrix  of  joint  probabilities  the  lethality  for  interval  j  is  the 
sum  of  all  probabilities  for  the  jth  column  from  rows  1  to  I  divided  by  the 
death- without-disease  probability  in  column  j.  The  formula  for  the  lethality 
in  column  j  is 

/ 

i=l 

The  covariance  matrix  for  the  vector  of  lethalities,  L^  =  (Li,...,Lj),  is 
computed  using  the  gradient  matrix,  gradL,  defined  as  the  J  by  K  matrix 
whose  (r,s)  element  is 

The  delta  method  then  implies  that  L  is  approximately  multivariate  nor¬ 
mally  distributed  with  covariance  matrix  equal  to 

Si  =  (gradL)  Si^(gradL)^. 

The  lethalities  of  the  two  groups  can  be  compared  by  using  the  chi-square 
statistic 


(gradLi  -  gradL2)(Eij  -f  (gradLi  -  gradL2)^, 
where  the  degrees  of  freedorh  are  equal  to  the  rank  of  Sii  -1-  Si2- 

7  Example 

This  example  compares  the  morbidity/mortality  experience  of  two  groups. 
One  group  (group  E)  was  exposed  to  an  environmental  contaminant,  while 
the  other  group  (group  C)  was  not  exposed.  Under  the  null  hypothesis,  both 
groups  are  random  samples  from  the  same  population.  Deaths  or  disease 
that  had  not  occurred  by  the  time  the  data  was  collected  were  right  censored 
at  the  calendar  time  of  the  collection.  Table  3  shows  the  distributions  of  the 
censoring  patterns  for  both  groups. 


Table  3 

Censoring  Pattern  Distributions  by  Group 
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Pattern 

Group  E 

Frequency  Percent 

Group  C 

Frequency  Percent 

1 

39 

3.2 

51 

0.3 

5 

461 

37.3 

575 

3.0 

8 

1 

0.1 

1 

0.0 

17 

36 

2.9 

50 

0.3 

19 

632 

51.2 

1016 

5.3 

21 

42 

3.4 

1572 

8.2 

23 

24 

1.9 

15791 

82.9 

Five-year  intervals  were  chosen.  Since  the  first  three  five-year  intervals 
were  sparse,  the  first  interval  was  taken  to  be  [0,15).  Subsequent  intervals 
were  defined  in  increments  of  five  years  and  the  final  interval  was  [35, -f). 
The  completely  uncensored  data  (censoring  patterns  1  and  17)  are  given  in 
Table  4.  There  were  no  uncensored  observations  with  time-to-death  greater 
than  30  in  group  E.  Because  the  rectangle  boundaries  must  be  the  same  for 
both  groups,  the  algorithm  reduced  the  the  time  intervals  for  group  C  to  end 
with  [25, -f). 

Table  4 

Frequencies  for  Uncensored  Data  by  Group 


Group  E 

Time-to- Onset 

[0,15) 

Time-to-Death 
[15,20)  [20,25) 

[25,+) 

Total 

death  w/o  disease 

2 

13 

16 

5 

36 

[0,15) 

2 

7 

4 

0 

13 

[15,20) 

0 

8 

7 

0 

15 

[20,25) 

0 

0 

6 

2 

8 

[25,+  ) 

0 

0 

0 

3 

3 

Total 

4 

28 

33 

10 

75 

Group  C 

Time-to-Onset 

[0,15) 

Time-to-Death 
[15,20)  [20,25) 

[25,+) 

Total 

death  w/o  disease 

3 

14 

25 

8 

50 

[0,15) 

1 

4 

2 

2 

9 

[15,20) 

0 

6 

7 

4 

17 

[20,25) 

0 

0 

10 

7 

17 

[25.+  ) 

0 

0 

0 

8 

8 

Total 

4 

24 

44 

29 

101 
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In  group  E,  the  rectangles  with  time-to-onset  in  (0,15)  and  [15,20)  and 
time-to-death  in  [25,+)  have  0  counts.  The  algorithm  replaced  these  with  the 
sum  of  the  counts  in  the  adjacent  rectangles  divided  by  the  total  number  of 
uncensored  patterns.  For  example,  the  rectangle  with  time-to-onset  interval 
[0,15)  and  time-to-death  interval  [25,+)  is  given  a  count  of  (16+5+4+7)/75 
(e.g.  32/75)  to  begin  the  EM  algorithm. 

The  final  probability  estimates  for  group  E  and  their  standard  deviations 
resulting  from  the  application  of  the  EM  algorithm  and  the  inversion  of  the 
information  matrix  are  given  in  Table  5.  The  corresponding  results  for  group 
C  are  given  in  Table  6. 

Table  5 

Joint  Probabilities  for  Group  E 


Time-to- Death 

Time-to-Onset 

[0,15) 

[15,20) 

[20,25) 

(25,+) 

death  w/o  disease 

Relative  Risk 

0.016599 

0.012030 

0.013348 

0.157371 

Std  Deviation 

0.008682 

0.003220 

0.003292 

0.047070 

[0,15) 

Relative  Risk 

0.016599 

0.006478 

0.003337 

0.126720 

Std  Deviation 

0.008682 

0.002403 

0.001663 

0.009579 

[15,20) 

Relative  Risk 

0 

0.007403 

0.005840 

0.149910 

Std  Deviation 

0 

0.002562 

0.002194 

0.010280 

[20,25) 

Relative  Risk 

0 

0 

0.005006 

0.073713 

Std  Deviation 

0 

0 

0.002033 

0.007522 

[25,+  ) 

Relative  Risk 

0 

0 

0 

0.405646 

Std  Deviation 

0 

0 

0 

0.046796 

Table  6 

Joint  Probabilities  for  Group  C 

Time-to- Onset 

[0,15) 

Time-to-Death 
[15,20)  [20,25) 

[25,+) 

death  w/o  disease 

Relative  Risk 

0.025804 

0.010770 

0.012631 

0.269909 

Std  Deviation 

0.004392 

0.001600 

0.001999 

0.034742 

[0,15) 

Relative  Risk 

0.009824 

0.002760 

0.001889 

0.076160 

Std  Deviation 

0.004215 

0.001139 

0.000826 

0.005621 

[15,20) 

Relative  Risk 

0 

0.003523 

0.003456 

0.128231 

Std  Deviation 

0 

0.001372 

0.001426 

0.008000 

[20,25) 

Relative  Risk 

0 

0 

0.006163 

0.158036 

Std  Deviation 

0 

0 

0.001594 

0.011426 

[25,+  ) 

Relative  Risk 

0 

0 

0 

0.290843 

Std  Deviation 

0 

0 

0 

0.032916 
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Inverting  the  information  matrices  for  each  group  and  computing  the  chi- 
square  statistic  for  the  difference  of  the  joint  probability  distributions  yields, 

(Pk  -  Pc)(Sij  +  Ilcr(PH  -  Pc)^  =  66.358, 

which  has  13  degrees  of  freedom  (because  there  are  14  possible  rectangles) 
and  a  p-value  less  than  0.0001.  Tables  7  and  8  give  the  relative  risks  of  a 
morbidity/mortality  event  in  rectangle  Rij.  The  overall  relative  risk  of  dying 
with  disease  is  1.1759  with  standard  deviation  0.092775. 

Table  7 
Relative  Risks 


Time-to- Onset 

[0,15) 

Time-to-Death 
[15,20)  [20,25) 

[25, -4-) 

death  w/o  disease 

Relative  Risk 

0.646281 

1.116966 

1.056730 

0.583053 

Std  Deviation 

0.353840 

0.341972 

0.309622 

0.189854 

[0,15) 

Relative  Risk 

1.689636 

2.347056 

1.766675 

1.663865 

Std  Deviation 

1.143107 

1.302293 

1.171078 

0.178857 

[15,20) 

Relative  Risk 

2.101399 

1.689729 

1.169062 

Std  Deviation 

1.094568 

0.942833 

0.108385 

[20,25) 

Relative  Risk 

0.812170 

0.466431 

Std  Deviation 

0.391059 

0.058335 

[25,-1-  ) 

Relative  Risk 

1.394724 

Std  Deviation 

0.225397 

Table  8 

Relative  Risks  by  Time-to-Death  Interval 


Time-to-Death 

Relative  Risk 

Standard  Deviation 

[0,15) 

1.689636 

1.143106 

[15,20) 

2.209311 

0.775555 

[20,25) 

1.232381 

0.361280 

|25,+  ) 

1.157267 

0.094997 
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Table  9  gives  the  lethality  estimates  by  time-to-death  intervals  and  their 
covariances. 


Table  9 

Lethality  Estimates  and  Covariances 


Group  E 

Time-to-Death 

Lethality 

[0,15) 

Covariance 
Time-to-Death 
[15,20)  [20,25) 

[25,-1-) 

[0,15) 

1.00000 

1.00000 

0.00000 

-0.00000 

-0.00000 

[15,20) 

1.15385 

-0.00000 

0.19117 

-0.00000 

0.00000 

[20,25) 

0.06250 

-0.00000 

-0.00000 

0.13696 

0.00000 

[25,+  ) 

4.80386 

-0.00000 

0.00000 

0.00000 

3.00545 

Group  C 

Time-to-Death 

Lethality 

[0,15) 

Covariance 
Time-to-Death 
[15,20)  [20,25) 

[25,+) 

[0,15) 

0.38072 

0.05104 

0.00000 

-0.00000 

-0.00000 

(15,20) 

0.58335 

0.00003 

0.04927 

-0.00000 

0.00000 

[20,25) 

0.91106 

-0.00000 

0.00000 

0.08222 

-0.00000 

|25,+  ) 

2.42034 

-0.00004 

0.00002 

-0.00002 

0.19371 

The  chi-square  test  statistic  with  4  degrees  of  freedom  for  testing  the 
equality  of  the  two  vectors  of  lethalities  given  in  Table  9  was  3.5989  with 
p-value  0.4630. 
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Appendix  1:  Manual  for  Implementing  the  SAS^^ Macros 
for  Non-parametric  Estimation  of  the  Joint  Probability 
Distribution  of  Time-to-Onset  of  Disease  and  Time-to- 
Death 

Introduction 

This  manual  presents  a  set  of  SAS^^  macros  which  implement  the  methods 
for  morbidity/mortality  analysis  described  in  A  Non- Parametric  Analysis  of 
Morbidity /Mortality  Data  by  Mihalko  and  Michalek  (1998).  There  are  two 
main  macros  named  mortmorb  and  mm2.  All  of  the  macros  discussed  here 
appear  in  Appendix  2. 

The  mortmorb  macro  accepts  a  sample  of  disease  onset  times  and  times- 
to-death,  each  of  which  may  be  right,  left,  or  interval  censored.  Using  in¬ 
structions  from  the  user,  the  macro  formulates  a  set  of  rectangles  in  the  two 
dimensional  time-to-onset  of  disease  and  time-to-death  space  by  using  the 
uncensored  points,  including  those  with  a  known  time-to-death  and  disease 
known  not  to  have  occurred  before  death.  Each  observation  is  classified  ac¬ 
cording  to  its  censoring  pattern.  The  censoring  patterns  accommodated  are 
those  described  in  Tables  1  and  2  of  Mihalko  and  Michalek  (1998).  Using 
these  classifications  and  the  formulated  rectangles,  the  Estimation  Maximiza¬ 
tion  (EM)  algorithm  is  used  to  distribute  each  observation  into  its  possible  set 
of  rectangles.  The  final  set  of  pseudodata  is  used  to  compute  the  maximum 
likelihood  estimates  of  the  probabilities  of  the  rectangles.  The  algorithm  of 
Louis  (1982)  is  used  to  obtain  an  estimate  of  the  Fisher  information  matrix 
for  the  probability  estimates.  The  mm2  macro  works  in  the  same  way  as  the 
mortmorb  macro  except  that  it  accepts  a  prescribed  set  of  rectangles. 

This  manual  also  shows  how  to  use  the  mm2  macro  to  obtain  the  joint 
probability  estimates  for  two  independant  samples  using  the  same  set  of  rect¬ 
angles.  Section  2  describes  the  format  of  the  input  data.  Section  3  specifies 
the  arguments  for  the  main  macros,  mortmorb  and  mm2,  and  gives  an  ex¬ 
ample.  Section  4  details  the  components  of  the  two  macros.  Section  5  shows 
how  to  use  the  results  of  the  macros  to  test  for  a  difference  between  the  joint 
distributions  of  two  samples.  Appendix  2  gives  the  code  for  producing  esti¬ 
mates  of  the  joint  distributions  used  in  the  example  of  Mihalko  and  Michalek 
(1998). 

Input  Data  Format 

The  input  data  must  contain  five  variables  each  for  time-to-onset  of  disease 
and  time-to-death.  These  five  variables  describe  the  censoring  pattern  for 
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each  event  and  give  the  times  associated  with  the  censoring  pattern.  The 
names  and  definitions  of  these  required  variables  for  time-to-onset  of  disease 
are: 

obsT: 

ObsT  is  the  actual  observed  time-to-onset  of  disease.  If  time-to-onset  of 
disease  is  censored  on  one  side  then  obsT  is  the  censored  value.  If  time-to- 
onset  is  interval  censored,  then  obsT  =  . 

obsTL: 

ObsTL  is  equal  to  if  time-to-onset  is  not  censored  on  the  left  and  right, 
i.e.  interval  censored.  Otherwise,  obsTL  is  the  left  endpoint  of  the  censoring 
interval. 

obsTR: 

ObsTR  is  equal  to  if  time-to-onset  is  not  censored  on  the  left  and  right. 
Otherwise,  obsTR  is  the  right  endpoint  of  the  censoring  interval. 

iTL: 

ITL  is  the  indicator  of  left  censorship  for  time-to-onset  of  disease.  ITL  =  1  if 
time-to-onset  of  disease  is  left  censored  or  interval  censored  and  0  otherwise. 

iTR: 

ITR  is  the  indicator  of  right  censorship  for  time-to-onset  of  disease.  ITR 
=  1  if  time-to-onset  of  disease  is  right  censored  or  interval  censored  and  0 
otherwise. 

Death  known  to  have  occurred  before  disease: 

If  death  is  known  to  have  occurred  before  disease  then  obsT,  obsTL  and 
obsTR  should  all  be  recorded  as  missing,  and  iTL=0  and  iTR=l.  The 
reason  for  this  coding  is  that,  technically,  disease  has  been  right  censored  by 
death.  All  of  the  obsT,  obsTL  and  obsTR  variables  are  set  to  missing  so 
that  the  macro  can  distinguish  between  the  case  that  time-to-onset  did  not 
occur  before  death  and  the  case  that  nothing  is  known  about  time-to-onset 
of  disease. 

If  time-to-onset  of  disease  is  right  censored  then  iTL  =0,  iTR=l,  ob- 
sTL=obsTR  =  missing,  and  obsT  is  the  time  of  right  censorship.  This 
indicates  that  time-to-onset  of  disease  is  known  to  be  larger  than  the  value 
in  the  variable  obsT  or  that  death  occurred  before  time-to-onset  of  disease. 

The  variables  for  time-to-death  correspond  to  those  for  time-to-onset  of 
disease  and  are  listed  below. 
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obsD: 

ObsD  is  the  actual  observed  time-to-death.  If  time-to-death  is  censored 
on  one  side  then  obsD  is  the  censored  value.  If  time-to-death  is  interval 
censored,  then  obsD  =  missing. 

obsDL: 

ObsDL  is  equal  to  if  time-to-death  is  not  censored  on  the  left  and 
right.  Otherwise,  obsDL  is  the  left  endpoint  of  the  censoring  interval. 

obsDR: 

ObsDR  is  equal  to  if  time-to-death  is  not  censored  on  the  left  and 
right.  Otherwise,  obsDR  is  the  right  endpoint  of  the  censoring  interval. 

iDL: 

IDL  is  the  indicator  of  left  censorship  for  time-to-death.  IDL  =  1  if 
time-to-death  is  left  censored  or  interval  censored  and  0  otherwise. 

iDR: 

IDR  is  the  indicator  of  right  censorship  for  time-to-death.  IDR  =  1  if 
time-to-death  is  right  censored  or  interval  censored  and  0  otherwise. 


The  Macros  Mortmorb  and  MM2 

Two  ’driver’  macros,  named  mortmorb  and  mm2,  implement  the  methods 
of  Mihalko  and  Michalek  (1998).  Both  drivers  produce  maximum  likelihood 
estimates  of  a  discretized  version  of  the  joint  distribution  of  time-to-onset 
of  disease  and  time-to-death  and  the  associated  Fisher  information  matrix. 
The  driver  named  mortmorb  determines  the  rectangles  to  be  used  for  the 
joint  distribution  using  arguments  supplied  by  the  user.  The  macro  named 
mm2  accepts  a  set  of  rectangles  chosen  by  the  user. 

The  arguments  for  mortmorb  are; 

datafile: 

The  argument  datafile  is  the  name  of  a  SAS^'^  data  set  containing  the 
ten  variables  described  in  Section  2.  The  datafile  may  contain  other  vari¬ 
ables,  such  as  an  identification  number  for  the  observation  and  any  potential 
explanatory  variables.  An  output  data  set  is  produced  that  contains  all  of 
the  information  in  datafile  plus  the  results  of  the  EM  algorithm  for  each 
observation. 
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outprobs: 

The  argument  outprobs  should  be  a  SAS™  data  set  name,  permanent 
or  temporary.  After  completion  of  the  macro,  the  data  set  outprobs  will 
contain  one  observation  with  variables  named  newrows,  newcols  and  a  two 
dimensional  array  newp(newrows,  newcols).  The  variable  newrows  will  con¬ 
tain  the  number  of  intervals  into  which  time-to-onset  of  disease  has  been 
divided  for  the  joint  probability  distribution.  The  count  in  newrows  will 
also  include  a  category  of  missing  to  account  for  observations  for  which  it  is 
known  that  death  occurred  before  onset  of  disease,  if  there  were  any  such  ob¬ 
servations  with  uncensored  time-to-death.  The  variable  newcols  will  contain 
the  number  of  intervals  into  which  the  time-to-death  range  is  partitioned  for 
the  joint  distribution.  The  array  newp(newrows,newcols)  will  contain  the 
maximum  likelihood  estimates  of  the  joint  probabilities  for  the  array  of  rect¬ 
angles  defined  by  the  cross  tabulation  of  the  time-to-onset  of  disease  rows 
and  time-to-death  columns.  The  lower  triangular  part  of  the  submatrix  com¬ 
posed  of  this  matrix  without  the  row  corresponding  to  death  before  disease 
(if  such  a  row  is  included)  will  have  zero  values  for  the  probabilities  because 
disease  cannot  be  observed  after  death.  The  total  number  of  probabilities, 
zeros  included,  is  the  product  of  newrows  and  newcols.  Letting  this  product 
be  called  total,  the  array  newp  will  be  comprised  of  variable  newpl  through 
newptotal. 

pseudo: 

The  argument  pseudo  should  be  the  name  of  a  SAS^'^  data  set  which 
will  contain  all  of  the  original  data  from  the  input  data  set  named  in  the 
argument  datafile  and  an  array,  ps(newrows,newcols),  which  contains  the 
EM  algorithm  distribution  for  each  observation  in  the  newrows  by  newcols 
matrix  of  onset-of-disease  by  time-of-death  rectangles.  Recall  that  newrows 
and  newcols  are  variables  in  the  outprobs  data  set. 

intobs: 

The  argument  intobs  determines  the  minimum  number  of  uncensored  ob¬ 
servations  that  the  user  requires  in  each  interval  of  the  partitions  of  the  time- 
to-onset  space  and  the  time-to-death  space.  The  algorithm  for  determining 
the  original  partitions  cuts  the  two  time-to-event  ranges  into  the  maximum 
number  of  intervals  that  allow  each  interval  to  contain  at  least  intobs  uncen¬ 
sored  times.  The  interval  endpoints  for  both  time-to-onset  and  time-to-death 
are  then  combined  to  form  a  common  set  of  intervals  which  are  used  to  define 
the  original  set  of  time-to-onset  of  disease  by  time-to-death  rectangles.  The 
smaller  the  number  of  intobs,  the  larger  the  number  of  rectangles  with  which 
the  algorithm  begins.  The  number  of  rectangles  will  be  reduced  later  by  a 
macro  named  revzero  if  the  EM  algorithm  fails  to  put  a  positive  count  in  any 
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rectangle  for  which  it  is  possible  to  have  an  uncensored  observation.  This 
process  is  described  in  the  discussion  of  the  revzero  macro. 

tol: 

The  argument  tol  is  the  measure  of  convergence  desired  for  the  EM  al¬ 
gorithm.  The  EM  algorithm  iteratively  estimates  the  maximum  likelihood 
estimator.  After  each  iteration  of  the  EM  algorithm,  the  sum  of  squared  dif¬ 
ferences  between  the  current  and  past  estimates  is  computed  and  compared 
to  tol.  The  EM  algorithm  stops  when  the  sum  of  squared  differences  is  less 
than  tol. 

its: 

The  argument  its  is  the  maximum  number  of  iterations  of  the  EM  al¬ 
gorithm  that  the  user  will  allow.  Although  theory  implies  that  the  EM 
algorithm  converges  uniformly  to  the  maximum  likelihood  estimate,  it  does 
not  say  how  fast  it  converges.  In  simulations  using  the  mortmorb  macro 
convergence- was  obtained  by  20  iterations  using  a  tolerance  of  0.0001.  If 
convergence  does  not  occur  within  100  iterations  it  may  be  a  good  idea  to 
increase  the  value  of  intobs  and  start  again.  The  EM  algorithm  will  continue 
its  iterations  until  the  sum  of  squared  differences  described  in  the  previous 
paragraph  is  less  than  the  value  in  the  argument  tol  or  until  it  completes 
the  number  of  iterations  given  in  the  argument  its.  If  the  algorithm  stops 
without  converging,  the  log  window  will  contain  a  message  saying  that  con¬ 
vergence  was  not  obtained  in  its  iterations  and  give  the  values  of  the  sum  of 
squared  errors  for  the  last  iteration  of  the  algorithm. 

libdrive: 

The  argument  libdrive  gives  a  computer  address  which  the  macro  will  use 
to  define  the  library  in  which  the  formats  for  the  partitions  of  the  time-to- 
onset  of  disease  and  the  time-to-death  will  be  stored.  The  macro  produces  a 
SAS^^^  format  for  categorizing  the  two  times-to-e vents.  These  formats  will 
be  stored  cis  ?dis  for  the  time-to-onset  of  disease  and  ?deth  for  the  time-to- 
death.  where  ?  is  the  value  in  the  argument  pref,  which  stands  for  prefix. 
An  example  of  a  value  for  libdrive  is  c  :  \windows\desktop.  Notice  that  no 
quotation  marks  are  needed. 

inform: 

The  argument  inform  is  a  SAS^^  data  set  name  chosen  by  the  user 
to  contain  the  estimated  information  matrix  for  the  estimates  of  the  joint 
probability  distribution.  This  information  matrix  is  in  a  two  dimensional 
array  named  im.  This  array  will  correspond  to  the  vector  of  probabilities 
ordered  across  rows  and  down  columns  of  the  matrix  of  rectangles,  leaving 
out  the  impossible  rectangles.  The  last  probability  is  not  included  in  this 
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vector,  because  it  is  equal  to  one  minus  the  sum  of  the  other  probabilities. 
Hence,  if  there  are  14  possible  rectangles,  as  in  the  example  of  Mihalko  and 
Michalek  (1998),  the  information  matrix  will  have  dimensions  13  by  13. 

pref: 

The  argument  pref  is  the  prefix  that  the  user  chooses  to  be  put  on  the 
formats  for  time-to-onset  and  time-to-death.  Pref  can  have  at  most  4  charac¬ 
ters.  The  formats  for  time-to-onset  and  time-to-death  will  be  named  prefdis 
and  prefdeth,  respectively,  where  pref  is  replaced  by  the  character  string 
value  chosen  for  the  argument  pref.  In  addition,  two  other  formats,  named 
prefrow  and  prefcolumn,  are  saved  in  the  libref  argument  address.  These 
formats  replace  the  row  and  column  numbers  with  the  appropriate  row  and 
column  intervals  which  define  the  matrix  of  rectangles. 

The  arguments  for  mm2: 

The  difference  between  the  mortmorb  macro  and  the  mm2  macro  is  that 
the  mm2  macro  accepts  a  format  for  partitioning  the  times-to-onset  of  dis¬ 
ease  and  the  times-to-death.  These  formats  are  used  to  define  the  original  set 
of  time-to-onset  by  time-to-death  rectangles.  The  mm2  macro  then  produces 
the  same  output  as  the  mortmorb  macro.  In  a  two  sample  study,  mortmorb 
could  be  used  with  the  smaller  sample  to  determine  a  set  of  rectangles  and 
formats.  These  rectangles  could  be  then  used  in  mm2  with  the  second  sam¬ 
ple,  so  that  the  pseudodata  and  the  joint  probabilities  for  both  samples  are 
defined  for  the  same  set  of  rectangles. 

datafile,  outprobs,  pseudo,  tol,  its,  libdrive,  inform,  pref: 

These  arguments  are  the  same  as  those  of  the  same  name  in  the  mortmorb 
macro. 

disfmt  and  dethfmt: 

The  arguments  disfmt  and  dethftnt  are  the  names  of  the  formats  to  be 
used  for  the  time-to-onset  and  time-to-death  variables,  respectively.  These 
formats  must  be  stored  in  the  address  given  in  the  argument  libdrive. 

intimes: 

The  argument  intimes  is  the  name  of  a  SAS^^  data  set  containing  vari¬ 
ables  tl  through  t (number  of  times),  where  the  values  of  the  t’s  are  the  end 
points  of  the  intervals  defined  in  disfmt  and  dethfmt. 

Note  on  disfmt,  dethfmt  and  intimes: 

These  arguments  define  the  original  set  of  rectangles  that  are  used  to 
compute  the  starting  probabilities  for  the  EM  algorithm.  Generally  they  will 
be  the  output  formats  and  times  from  a  previous  run  of  the  mortmorb  macro 
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or  the  mm2  macro.  However,  the  only  requirement  is  that  the  times  be  the 
points  that  define  the  intervals  formatted  in  disfmt  and  dethfmt.  Not  all  of 
the  times  need  to  be  used  in  the  format  for  disfmt,  but  all  the  times  must  be 
used  in  the  dethfmt.  If  disfmt  defines  a  death-before-disease  category  and  K 
intervals,  while  dethfmt  defines  J  intervals,  then  the  first  J— 1  intervals  of  the 
dethfmt  must  be  the  first  J— 1  intervals  defined  by  disfmt.  The  Jth  interval 
defined  by  disfmt  must  be  the  rmion  of  the  last  K— J  intervals  defined  by 
dethfmt. 


The  Macro  Components 

The  main  macro  drivers,  mortmorb  and  mm2,  share  all  but  one  macro  com¬ 
ponent.  We  describe  each  macro  in  the  order  that  it  appears  in  mortmorb 
and  then  discuss  mm2  by  highlighting  the  small  differences  between  the  two 
drivers. 

eminvals  (infile, outfile,timesout  ,intobs,libdrive) : 

The  macro  eminvals  forms  the  original  rectangles  in  the  time-to-onset  of 
disease  by  time-to-death  sample  space.  The  input  data  set  is  named  in  the 
argument  infile,  which  must  have  the  form  described  above  for  the  mortmorb 
macro  argument  datafile.  The  macro  removes  the  uncensored  observations, 
including  those  with  death  is  known  to  have  occurred  before  the  onset  of  dis¬ 
ease,  and  orders  the  resulting  times-to-onset  of  disease  and  times-to-death 
separately.  For  each  set  of  times  to  disease  onset  and  death,  the  macro 
chooses  the  points  in  the  ordering  that  have  intobs  (as  defined  in  the  mort¬ 
morb  description)  times  between  them.  The  resulting  two  sets  of  times  are 
merged  into  one  set  of  distinct  times.  These  times  are  used  in  a  macro  named 
ddformat  to  create  formats  for  onset  times  and  times-to-death  as  well  as  for¬ 
mats  relating  the  row  and  column  indices  to  the  interval  definitions.  The 
formats  all  have  the  prefix  that  the  user  puts  in  the  argument  pref,  the  suf¬ 
fix  is  dis  for  the  onset  times,  deth  for  the  times-to-death,  row  for  the  row 
index  and  column  for  the  column  index.  The  format  for  onset  times  includes 
a  category  for  missing,  which  is  assigned  to  an  observation  that  has  death 
known  to  have  occurred  before  disease.  These  formats  are  used  with  PROC 
FREQ  to  compute  the  counts  of  the  uncensored  observations  in  each  of  the 
intervals. 

The  formats  are  stored  in  the  address  given  in  the  argument  libdrive.  The 
value  given  to  the  argument  timesout  will  be  the  name  of  a  data  set  containing 
one  observation  which  contains  the  times  comprising  the  endpoints  of  the 
intervals.  These  times  are  in  indexed  variables  tl  through  t  (number  of  times). 
The  value  given  to  the  argument  outfile  will  be  the  name  of  the  data  set  that 
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contains  the  output  of  the  PROC  FREQ.  This  output  is  a  table  of  counts  for 
the  uncensored  observations  and  the  rectangles  defined  by  the  two  formats. 

initprob  (infile, outfile) : 

The  macro  initprob  receives  an  output  SAS^^  data  set  from  PROC 
FREQ  named  in  the  argument  infile  and  produces  a  SAS^^  data  set  named 
by  the  value  of  outfile,  which  will  contain  a  set  of  initial  probabilities  for  the 
rectangles  of  the  table  defined  by  the  infile.  The  macro  determines  whether 
or  not  there  is  a  row  for  missing,  which  would  correspond  to  death  known 
to  have  occurred  before  disease.  If  so,  it  assigns  a  1  to  a  global  macro  vari¬ 
able  named  dwod  (death  without  disease).  If  there  were  no  deaths  known 
to  have  occurred  before  disease  then  dwod  is  assigned  0.  The  macro  assigns 
zeros  to  the  lower  triangle  of  the  matrix  if  dwod  is  0.  If  dwod  is  1,  then 
the- macro  assigns  zeros  to  the  lower  triangle  of  the  the  matrix  defined  by 
eliminating  the  first  row.  If  any  rectangle  that  can  legitimately  be  nonzero  is 
empty,  a  small  amount  of  probability  is  put  into  that  rectangle.  The  amount 
of  probability  put  into  rectangles  with  zero  counts  is  the  sum  of  the  counts 
of  adjacent  rectangles  divided  by  the  total  number  of  uncensored  observa¬ 
tions.  If  all  adjacent  rectangles  have  zero  counts  then  the  probability  put 
into  the  rectangle  is  1  divided  by  the  number  of  uncensored  observations. 
To  compute  the  initial  probabilities,  the  actual  counts  for  nonzero  rectangles 
and  the  ’adjusted’  counts  for  zero  rectangles  are  summed.  Then,  the  rectan¬ 
gle  probabilities  are  computed  by  di-vdding  the  count  or  adjusted  count  for 
each  rectangle  by  this  sum  of  counts  or  adjusted  counts.  These  initial  prob¬ 
abilities  are  put  into  one  observation  with  a  two  dimensional  array  named 
newp.  The  array  has  &;numrows  number  of  rows  and  fenumcols  number  of 
columns,  where  numrows  and  numcols  are  global  macro  variables  containing 
the  number  of  rows  and  number  of  columns  of  the  time-to-onset  of  disease 
by  time-to-death  matrix  of  rectangles.  This  two  dimensional  matrix  contains 
the  triangle  of  empty  rectangles.  Lastly,  a  global  macro  variable  named  total, 
the  total  number  of  rectangles,  is  produced. 

In  the  macro  mortmorb,  the  PROC  FREQ  output  SAS^^  data  set  pro¬ 
duced  by  eminvals  is  named  freqout  and  is  the  infile  to  initprob.  The  outfile 
produced  by  initprob  is  named  initprob. 

cellsgn  (infile, datafile, clasfile,numrow,numcol, dwod): 

The  macro  cellsgn  augments  datafile  with  the  row  and  column  of  the 
matrix  defined  in  infile.  The  arguments  numrow,  numcol  are  the  number  of 
rows  and  columns  of  the  matrix  of  rectangles,  while  dwod  is  the  indicator 
as  to  whether  or  not  there  is  a  row  for  death  without  disease.  The  resulting 
SAS^^  data  set  will  be  named  by  the  value  given  to  the  argument  clasfile. 
Infile  must  be  a  SAS^'^  data  set  containing  one  observation  with  an  array 
t  (numcol)  of  ordered  times,  of  the  type  put  out  in  the  timesout  argument  of 
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eminvals.  The  names  of  the  new  variables  in  clasfile  are  row,  rowL  and  rowR 
indicating  the  row  classification  of  obsT,  obsTL  and  obsTR,  respectively. 

Similarly  the  column  classifications  for  obsD,  obsDL  and  obsDR  are  column, 
columnL  and  columnR,  respectively. 

In  the  mortmorb  macro,  the  data  set  infile  is  named  times  and  is  the 
output  of  timesout  from  the  eminvals  macro.  The  arguments  numrow,  num- 
col  and  dwod  are  the  global  macro  variables  numrows,  numcols  and  dwod 
created  in  the  macro  initprob.  The  argument  for  datafile  is  the  original  mor¬ 
bidity/mortality  data  set  put  into  the  argument  datafile  of  the  mortmorb 
macro.  The  output  data  set  whose  argument  is  clasfile  is  called  outmm  in 
the  mortmorb  macro. 

classify  (infile , out  file , dwod) : 

The  macro  classify  augments  the  SAS™  data  set  given  in  the  argument 
infile  with  a  variable  named  type.  The  type  variable  contains  the  sequence 
number  of  the  censoring  pattern  defined  in  Tables  1  and  2  of  Mihalko  and 
Michalek  (1998).  The  data  set  named  in  infile  must  contain  the  morbid¬ 
ity/mortality  data  as  defined  for  the  datafile  argument  of  the  mortmorb 
macro.  The  resulting  SAS^'^  data  set  will  have  the  name  given  to  the  argu¬ 
ment  outfile. 

In  the  macro  mortmorb,  the  infile  to  classify  is  the  clasfile  named  outmm 
from  cellsgn.  The  outfile  is  also  named  outmm.  The  argument  dwod  is  given 
the  global  macro  variable  dwod  which  was  assigned  a  value  in  the  macro 
initprob.  At  this  point  in  the  mortmorb  macro,  the  original  data  file  has 
been  augmented  twice.  The  current  augmented  version  is  named  outmm 
and  contains  all  the  original  data,  the  censoring  type  for  each  observation 
(in  the  variable  named  type),  and  the  row  and  column  for  the  observed  and 
censoring  values  for  each  observation. 

At  this  time,  the  only  censoring  types  that  can  be  correctly  assigned  by 
the  classify  macro  are  the  24  types  listed  in  Mihalko  and  Michalek  (1998). 

If  the  data  includes  other  censoring  types  then  this  macro  as  well  as  the  em 
macro  that  follows  must  be  modified.  The  modification  should  be  straight¬ 
forward  given  the  form  of  the  code  in  these  two  macros. 

em(datafile,probfile,emout, probout, numrows, numcols, total, tol, 
its, dwod): 

The  em  macro  performs  the  EM  algorithm  on  observations  in  the  SAS^'^ 
data  set  named  in  the  argument  datafile.  The  datafile  SAS^'^  data  set  must 
contain  all  the  variables  described  for  the  datafile  argument  in  the  mortmorb 
macro  as  well  as  row  and  column  classifications  for  the  observed  and  censoring 
values  of  the  two  time-to-event  variables  and  a  variable  named  type  that 
contains  the  censoring  type.  The  arguments  for  this  macro  are  defined  below. 


24 


probfile: 

The  argument  probfile  is  a  SAS^^  data  set  containing  the  initial  prob¬ 
abilities  for  the  time-to-onset  of  disease  by  time-to-death  rectangles.  The 
data  set  has  one  observation  with  total  (the  product  of  numrows  and  num- 
cols)  variables  in  a  two  dimensional  array  newp(&:numrows,&:numcols),  where 
numrows  and  numcols  are  the  number  of  rows  and  the  number  of  columns 
for  the  matrix  defining  the  rectangles,  respectively. 

emout: 

The  value  given  the  argument  emout  is  the  name  of  a  SAS^^  data  set 
which  will  contain  the  original  data  in  datafile  and  the  EM  algorithm  dis¬ 
tributions  for  each  observation.  The  distributions  for  each  observation  are 
in  the  two  dimensional  array  named  ps(numrows, numcols).  For  each  obser¬ 
vation  the  double  sum  over  all  rows  and  all  columns  of  the  ps  array  should 
add  to  1.  The  em  macro  performs  this  test  after  each  iteration  and  puts  a 
message  into  the  log  window  if  the  distributions  of  the  observations  do  not 
add  up  to  the  total  sample  size. 

probout: 

The  argument  probout  holds  the  name  to  be  given  to  the  SAS^^  data  set 
that  contains  the  final  set  of  probabiUties  for  the  time-to-onset  of  disease  by 
time-to-death  rectangles.  The  form  of  this  data  set  is  the  same  as  that  of  the 
SAS^'^  data  set  named  in  the  argument  probfile,  including  the  same  array 
prefix,  newp.  The  values  of  this  observation  axe  the  maximum  likelihood 
estimates  of  the  rectangle  probabilities. 

numrows,  numcols,  total,  tol,  its,  dwod: 

The  arguments  numrows,  numcols  and  total  are  the  number  of  rows, 
columns  and  rectangles  in  the  matrix  of  rectangles.  The  arguments  tol  and 
its  are  the  same  as  described  in  the  main  mortmorb  macro.  The  argument 
dwod  is  the  value  1  or  0  to  indicate  whether  the  data  set  contains  observations 
where  death  is  known  to  have  occurred  before  disease. 

How  the  em  macro  is  used  in  the  mortmorb  macro: 

The  argument  datafile  is  outmm,  the  twice  augmented  version  of  the 
original  datafile.  The  data  set  of  initial  probabilities  computed  in  the  macro 
initprob  and  named  initprob  is  used  as  the  probfile  argument.  The  argument 
emout  is  given  the  name  that  the  user  chose  for  the  argument  pseudo  in 
the  main  macro,  mortmorb.  The  user-specified  name  that  was  put  into  the 
argument  outprobs  in  the  main  macro  is  used  for  the  argument  probout. 
The  arguments  numrows,  numcols,  total  and  dwod  are  those  computed  by 
initprob.  The  arguments  tol  and  its  are  the  values  chosen  by  the  user  in  the 
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revzero(probs, times, numrows,numcols,numtimes, 
probpref,newprobs,newtimes,dwod): 

This  macro  checks  for  zero  values  for  the  possible  rectangles.  If  there  is 
a  zero  count  in  any  possible  rectangle  then  the  matrix  in  probs  is  collapsed 
about  that  rectangle.  After  the  collapsing,  the  search  for  zero  rectangles  is 
repeated.  The  process  stops  after  the  collapsing  of  the  table  eliminates  all 
zero  rectangles  outside  of  the  impossible  rectangles.  Each  time  the  table  is 
collapsed  an  endpoint  defining  the  rectangles  is  eliminated.  The  new  set  of 
endpoints  are  put  into  a  data  set  named  in  the  argument  newtimes.  The 
newtimes  SAS^^  data  set  has  a  one  dimensional  array  with  prefix  t.  A 
global  macro  variable  named  newtime  is  produced  which  contains  the  count 
of  times  in  the  data  set  newtimes.  A  global  macro  variable  named  yes  is 
also  produced.  Yes  is  equal  to  0  if  the  matrix  of  probabilities  in  the  data 
set  probs  was  not  changed  and  equal  to  1  if  it  was  changed.  The  new  set  of 
probabilities  is  in  a  data  set  named  by  the  value  of  the  argument  newprobs. 
The  data  set  newprobs  has  one  observation  and  a  two  dimensional  array 
named  p(newrows,newcols),  where  newrows  and  newcols  are  global  macro 
variables  which  give  the  number  of  rows  and  columns  in  the  new  matrix  of 
probabilities.  If  revzero  changes  the  matrix  of  rectangles,  a  message  will 
appear  in  the  log  window. 

The  arguments  for  revzero: 

probs, probpref,  numrows,  numcols: 

The  argument  probs  holds  one  observation,  which  has  a  two  dimensional 
array  named  probpref(numrows,numcols). 

times,  numtimes: 

The  argument  times  holds  the  name  of  a  SAS^'^  data  set  which  contains 
the  endpoints  that  define  the  intervals  for  the  rectangles.  Times  must  have 
one  observation  which  has  an  array  named  t (numtimes). 

newprobs,  newtimes: 

The  arguments  newprobs  and  newtimes  are  names  chosen  by  the  user 
for  SAS^^  data  sets  to  hold  the  new  set  of  collapsed  probabilities  and  the 
new  set  of  interval  endpoints  that  resulted  from  the  collapsing  of  zero  rect¬ 
angles.  Newprobs  has  one  observation,  which  is  a  two  dimensional  array 
named  probpref(newrows,newcols),  where  newrows  and  newcols  are  global 
macro  variables  produced  by  the  macro.  Newtimes  has  one  observation  with 
an  array  named  t (newtimes),  where  newtimes  is  a  macro  variable  giving  the 
number  of  endpoints  left  after  the  collapsing. 

dwod: 

The  argument  dwod  indicates  with  a  1  that  there  are  observations  for 
which  death  is  known  to  have  occurred  before  disease  and  it  is  0  otherwise. 
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How  the  revzero  macro  is  used  in  the  mortmorb  macro: 

The  revzero  macro  receives  the  matrix  of  probabilities  produced  by  the 
macro  em  and  collapses  it  around  possible  rectangles  that  have  zero  counts. 
This  amounts  to  combining  rectangles  in  the  original  time-to-onset  by  time- 
to-death  matrix  so  that  there  are  no  possible  rectangles  that  have  an  esti¬ 
mated  probability  of  zero.  Although  zero  is  a  legitimate  estimated  proba¬ 
bility,  a  covariance  matrix  for  the  set  of  probabilities  that  contains  a  zero 
cannot  be  computed. 

If  the  macro  variable  yes  is  1,  then  it  is  neccessary  to  repeat  the  macros 
cellsgn  and  classify  with  the  new  rectangles  and  times.  It  is  also  neccessary 
to  rewrite  the  formats  defining  the  intervals  for  time-to-onset  of  disease  and 
time-to-death.  All  this  is  done  by  the  appropriate  macros  described  earlier. 
The  resulting  rectangle  memberships  and  censoring  types  are  put  into  the 
macro  em  to  compute  maximiun  likelihood  estimates  for  the  probabilities  of 
the  new  rectangles. 

eminfo(pseudata, nrows,ncols, inform, rowcol): 

The  macro  eminfo  uses  the  method  of  Louis  (1982)  to  estimate  the  infor¬ 
mation  matrix  for  the  estimates  of  the  joint  distribution  of  time-to-onset  of 
disease  and  time-to-death  using  the  final  pseudodata  produced  by  the  EM 
algorithm.  The  arguments  for  this  macro  are: 

pseudata,  nrows,  ncols: 

The  argument  pseudata  is  the  name  of  the  data  set  that  contains  the  final 
pseudodata  produced  by  the  EM  algorithm.  Each  observation  of  the  data 
set  must  contain  two  arrays,  ps(nrows,ncols)  and  np(nrows, ncols),  where  the 
arguments  nrows  and  ncols  axe  the  numbers  of  rows  and  columns  of  the 
time-to-onset  by  time-to-death  array  of  rectangles.  The  array  ps  contains 
the  contributions  of  each  observation  to  each  rectangle  as  determined  by  the 
EM  algorithm.  The  array  np  contains  the  final  EM  algorithm  estimates  of 
the  joint  probabilities  of  the  rectangles. 

inform: 

The  value  of  the  argument  inform  is  the  name  of  the  output  SAS^'^  data 
set  that  contains  the  information  matrix  for  the  estimates  of  the  joint  proba¬ 
bility  distribution.  This  data  set  has  one  observation  containing  the  estimate 
of  the  information  matrix  in  a  two  dimensional  array  named  im(infdim,infdim), 
where  infdim  is  the  number  of  rectangles  with  nonzero  probabilities  minus 
one.  The  dimension  is  one  less  than  the  number  of  possible  rectangles  since 
the  probabilities  must  sum  to  1.  The  observation  also  contains  a  one  dimen¬ 
sional  array  named  newp(infdim),  which  contains  the  vector  of  probabilities 
for  the  possible  rectangles  of  the  matrix.  The  data  set  inform  also  contains 
numrows,  numcols  and  infdim,  which  are,  respectively,  the  number  of  rows 
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and  columns  of  the  matrix  of  rectangles  and  the  dimension  of  the  information 
matrix. 

The  SAS^'^  data  set  inform  also  contains  the  row  and  column  numbers 
for  possible  rectangles  in  the  matrix  of  rectangles.  These  numbers  are  con¬ 
tained  in  two  one  dimensional  arrays  ni(infdim)  and  nj(infdim),  which  are 
the  row  and  column  numbers  respectively,  that  identify  the  vector  newp  of 
the  inform  data  set  with  the  original  set  of  joint  probabilities.  For  example, 
the  probability  newp(s)  in  the  array  newp  of  the  output  data  set  inform  is 
assigned  to  the  rectangle  defined  by  the  ni(s)  row  and  the  nj(s)  column  in 
the  matrix  of  time-to-onset  by  time-to-death  rectangles.  This  information  is 
needed  to  identify  estimates  and  their  estimated  variances  computed  from 
the  newp  and  im  arrays. 

How  the  eminfo  macro  is  used  in  the  mortmorb  macro: 

The  macro  eminfo  uses  the  emout  data  set  named  in  the  argument  pseudo 
of  the  mortmorb  declaration  as  the  value  of  the  argument  pseudata.  The 
values  of  the  arguments  nrows  and  ncols  are  the  values  of  the  global  macro 
variables  numrows  and  numcols  as  determined  by  the  macro  initprob  (if  there 
were  no  zero  counts  in  the  possible  rectangles)  or  by  the  macro  revzero  (if 
some  rectangles  had  to  be  combined).  The  information  matrix  is  named  by 
the  mortmorb  macro  argument  inform.  The  argument  rowcol  is  renamed 
prefrowcol,  where  pref  is  the  value  of  the  argument  pref  in  the  argument  list 
of  the  macro  mortmorb. 

The  macro  mm2: 

The  macro  mm2  produces  the  same  output  that  is  produced  by  the  macro 
mortmorb.  The  only  difference  between  the  two  macro  drivers  is  the  method 
by  which  the  initial  rectangles  are  chosen.  The  macro  mortmorb  uses  the 
macro  eminvals  and  the  user-defined  argument  intobs  to  determine  the  initial 
set  of  rectangles.  In  place  of  the  macro  eminvals,  mm2  uses  a  macro  named 
realfreq,  which  uses  the  formats,  disfmt  and  dethfmt,  to  define  the  rectangles 
and  produce  the  PROC  FREQ  output  of  counts  of  the  uncensored  data 
for  these  rectangles.  This  output  is  used  by  initprob  to  produce  the  initial 
probabilities  for  the  EM  algorithm  in  the  mm2  macro.  From  this  point 
forward,  mm2  and  mortmorb  are  the  same. 

How  mortmorb  and  mm2  work  together: 

In  order  to  compare  the  joint  distributions  for  two  samples,  the  distri¬ 
butions  must  be  defined  on  the  same  set  of  rectangles.  Mortmorb  can  be 
used  on  the  smaller  of  the  samples  to  obtain  a  set  or  rectangles  and  the  joint 
probability  and  information  matrix  estimates.  Mortmorb  also  produces  the 
set  of  times  and  formats  for  the  rows  and  columns  of  the  matrix  of  rect¬ 
angles.  These  data  sets  can  be  put  into  mm2  with  the  second  sample  to 
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obtain  estimates  for  the  second  sample.  Problems  may  arise  if  mm2  needs 
to  combine  some  of  the  rectangles.  Recall  that  if  rectangles  need  to  be  com¬ 
bined  by  the  revzero  macro,  a  message  will  appear  in  the  log  window.  In  this 
case  mm2  can  be  used  again  with  the  second  sample  rectangle  formats  and 
times  to  reclassify  and  estimate  the  joint  distribution  for  the  first  sample. 
Another  strategy  is  to  predetermine  the  set  of  rectangles  and  run  mm2  on 
both  samples  with  formats  and  times  defining  the  rectangles. 

Summary  of  How  the  Macros  Work: 

Both  macros,  mortmorb  and  mm2,  work  basically  the  same  way.  The  only 
difference  is  in  the  way  in  which  the  original  rectangles  are  created.  Mort¬ 
morb  uses  the  uncensored  data  with  a  user-specified  number  of  uncensored 
times  between  endpoints  to  create  the  rectangles,  while  mm2  uses  rectan¬ 
gles  determined  by  the  user.  The  initial  joint  probabilities  needed  to  begin 
the  EM  algorithm  process  are  computed  firom  the  frequencies  of  the  uncen¬ 
sored  observations  in  the  original  rectangles.  If  there  are  no  zero  counts  in 
the  possible  rectangles  (i.e.  those  for  which  time-to-onset  of  disease  is  less 
than  or  equal  to  time-to-death  or  where  death  is  known  to  have  occurred  be¬ 
fore  disease)  then  the  initial  probabilities  are  the  sample  proportions  of  the 
uncensored  observations  in  the  rectangles.  If  some  of  the  possible  rectangles 
have  zero  counts  then  those  with  zero  counts  are  given  a  small  positive  count. 
The  small  amount  added  to  a  zero  count  rectangle  is  the  larger  of  the  sum 
of  the  counts  for  adjacent  rectangles  divided  by  the  number  of  uncensored 
observations  and  the  reciprocal  of  the  number  of  uncensored  observations. 
This  assignment  of  initial  probabilities  is  done  by  the  macro  initprob.  The 
macro  cellsgn  first  determines  whether  or  not  the  data  contain  any  obser¬ 
vations  for  which  death  is  known  to  have  occurred  before  the  disease.  If 
this  is  the  case,  then  the  global  macro  variable  dwod  is  assigned  a  value 
of  one.  Otherwise,  dwod  is  assigned  the  value  zero.  Cellsgn  then  deter¬ 
mines  to  which  row  and  column  of  the  matrix  of  rectangles  the  observed 
times  or  censoring  times  belong.  These  rectangle  memberships  are  appended 
to  the  datafile.  This  appended  datafile  is  the  input  for  the  macro  classify, 
which  determines  each  observation’s  censoring  type.  The  censoring  type  for 
each  observation  is  appended  to  the  input  file  under  the  variable  name  type. 
The  twice  appended  datafile  is  the  input  for  the  macro  em,  which  uses  the 
rectangle  membership  and  censoring  pattern  information  to  distribute  each 
observation ,  among  the  rectangles.  The  EM  algorithm  stops  when  the  sum 
of  squared  differences  between  successive  joint  probability  estimates  is  less 
than  the  user-specified  tolerance  in  the  argument  tol  or  when  the  number 
of  interations  exceeds  the  user-specified  maximum  number  of  iterations  in 
the  argument  its.  The  array  of  distributions  among  the  rectangles  for  each 
observation  is  the  pseudodata.  The  macro  revzero  sums  the  contributions 
of  each  observation  in  the  pseudodata  array  for  each  rectangle.  If  there  are 
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any  zero  counts  in  the  possible  rectangles,  then  the  macro  collapses  rows  and 
columns  around  the  zero  rectangles  until  all  of  the  possible  rectangles  have 
positive  pseudocounts.  The  estimates  of  the  joint  probabilities  are  the  sam¬ 
ple  proportions  calculated  from  the  pseudodata.  If  collapsing  was  necessary, 
these  probabilities  are  used  in  the  em  macro  as  initial  probabilities  and  a 
new  set  of  pseudodata  and  estimates  are  calculated.  The  resulting  probabil¬ 
ity  estimates  and  the  pseudodata  are  the  input  for  the  macro  eminfo,  which 
estimates  the  information  matrix  for  the  estimates  of  the  joint  probabilities 
using  the  method  of  Louis  (1982). 

The  final  probability  estimates  are  output  in  the  user-specified  value  of 
the  argument  outprobs  in  a  one  dimensional  matrix  newp.  The  information 
matrix  is  in  a  SAS™  data  set  named  by  the  user  in  the  argument  inform 
in  a  two  dimensional  matrix  im.  This  data  set  also  contains  contains  two 
one  dimensional  matrices  named  ni  and  nj  which  give  the  row  and  column 
numbers  for  the  components  of  the  probability  estimate  vector.  A  SAS^'^ 
data  set  named  by  the  user  in  the  argument  pseudo  contains  the  resulting 
pseudodata  as  well  as  the  final  times  that  define  the  final  rectangles,  the 
number  of  rows  and  columns  and  the  value  of  dwod  in  variables  newrows, 
newcols  and  dwod.  The  formats  for  the  row  and  column  intervals  that  define 
the  rectangles  are  in  formats  prefdis  and  prefdeth,  where  pref  is  a  one  char¬ 
acter  prefix  chosen  by  the  user  in  the  main  list  of  arguments,  in  the  hbrary 
named  in  the  argument  libdrive.  Also  in  this  library  are  two  other  formats 
named  prefrow  and  prefcolumn,  which  format  the  row  and  column  indices 
with  the  row  and  column  interval  definitions. 


An  Example  Using  Morbidity/ Mortality  Data 

This  example  uses  a  predetermined  format  for  the  time-to-onset  and  time- 
to-death  variables.  The  macro  mm2  is  run  twice.  First,  it  is  run  on  the 
data  for  a  group  of  exposed  subjects  with  predefined  formats.  Then  the  four 
formats  produced  by  this  run  of  mm2  are  put  into  mm2  using  the  control 
data.  Examination  of  the  log  shows  that  the  control  run  did  not  change  the 
rectangles,  so  that  all  of  the  formats  put  out  by  mm2  in  the  two  runs  are 
the  same.  The  data  contains  a  variable  Id  that  identifies  an  observation  as 
that  of  a  member  of  the  exposed  group  (Id  =  E)  or  of  the  control  group  (Id 
=  C).  The  variables  used  to  create  the  censoring  patterns,  observations  and 
censoring  times  are  yotourl  (year  of  subject’s  entry  into  the  environment, 
referred  to  as  tour  1),  yonset  (year  of  onset),  yod  (year  of  death),  mortstat 
(mortality  status,  mortstat  =  D  means  that  the  subject  was  already  dead 
at  the  time  this  data  was  compiled)  and  cancer  (cancer  =  1  indicates  the 
subject  was  known  to  have  cancer,  cancer  =  0  means  the  subject  was  known 
not  to  have  cancer,  cancer  =  .  means  that  the  subject’s  cancer  state  was 
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unknown.). 

The  following  SAS^'^  data  step  computes  time-to-onset  of  disease,  t,  and 
time-to-death,  d,  from  year  of  entering  the  environment.  The  code  also  cre¬ 
ates  the  new  variables  needed  for  the  mortmorb  and  mm2  macros.  This  code 
is  included  under  the  name  cancer.sas  in  the  file  named  example.  In  order 
to  run  this  example  the  file  mortmorb  must  be  put  on  the  desktop  and  have 
the  address  c  :  \wind(nus\desktop\Tnortmc>rb. 


/***Cancer.sas****/ 

libname  mm  'c  :  \windows\desktop\mortmorb'-,/ /  data  mm. cancer;//  infile 
'c  :  \windows\desktop\rriortrnorb\cancer.dat! \/ /  input 
Casenr  $1-5 

Id  $7  /*  R=Exposed,  C=Control*/ 

Race  $9-10  /*NB=Non  Black  BL=Black*/ 

Occup  12  Yob  14-15 

Mob  17-1 

Yotourl  21-22 

Motourl  24-25 

Mortstat  $27  /*D=Death*/ 

ica  $28  /*  Legend  for  Cause  of  Death*/ 

Yod  30-31 
Mod  33-34 
Cancer  36 
Yonset  38-39 

Pptr  41-47  /*  Format  7.3*/ 

BodyFat  49-54  /*  Format  6.3*/ 

run; 

data  mm. cancer; 
set  mm. cancer; 

if  yotourl  gt  yonset  and  yonset  ne  .  then  delete; 
if  yonset  gt  yod  and  yod  ne  .  then  delete; 
if  yonset  =  .  then  t  =  .; 

else  t  =  yonset  -  yotourl;  /*time  to  disease  from  tourl.*/ 
if  yod  =  .  then  d  =  .; 

else  d  =  yod  -  yotourl;  /*time  to  death  firom  tourl.*/ 

current  =  97  -  yotourl;  /*time  from  tourl  to  present*/ 

if  t  =  .  and  cancer  =  .  and  d  =  .  then  do;  /*Type  23.  No  information 

about  cancer  status  and  death  is  right  censored.*/ 

obstl  =  .; 

obstr  =  .; 
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obst  =  0; 

itl=0; 

itr=l; 

obsdl  = 

obsdr  = 

obsd  =  current; 

idl  =  0; 

idr  =  1; 

end; 

else  if  t  =  .  and  cancer  =  0  and  d  =  .  then  do;  /*Type  19.  Cancer  is 

known  not  to  have  occurred  by  the  time  death  was  right  censored.*/ 

obstl  =  .; 

obstr  =  .; 

obst  =  .; 

itl=0; 

itr=l; 

obsdl  =  .; 

obsdr  =  .; 

obsd  =  current; 

idl  =  0; 

idr  =  1; 

end;  else  if  t  ne  .  and  d  =  .  then  do;  /*Type  5.  Time-to-onset  is  known 

and  time-to-death  right  censored.*/ 

obstl  =  .; 

obstr  =  ..; 

obst  =  t; 

itl=0; 

itr=0; 

obsdl  =  .; 

obsdr  =  .; 

obsd  =  current; 

idl  =  0; 

idr  =  1: 

end; 

else  if  t  =  .  and  cancer  =  .  and  d  ne  .  then  do;  /*Type  21.  No  information 

about  disease  and  time-to-death  is  known.  */ 

obstl  =  obstr  =  .; 

obst  =  0: 

itl=0; 

itr=l; 

obsdl  =  .; 

obsdr  =  .; 
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obsd  =  d; 
idl  =  0; 
idr  =  0; 
end; 

else  if  t  =  .  and  cancer  =  0  and  d  ne  .  then  do;  /*Type  17.  Cancer  is 

known  not  to  have  occurred  before  death  and  time-to-death  known.*/ 

obstl  =  .; 

obstr  =  .; 

obst  =  .; 

itl=0; 

itr=l; 

obsdl  =  .; 

obsdr  =  .; 

obsd  =  d; 

idl  =  0; 

idr  ==  0; 

end; 

else  if  t  =  .  and  cancer  =  1  and  d  =  .  then  do;  /*Type  8.  Cancer  is  known 

to  have  occurred  before  and  time-to-death  is  right  censored.*/ 

obstl  =  .; 

obstr  =  .; 

obst  =  current; 

itl=l; 

itr=0; 

obsdl  =  .; 

obsdr  =  .; 

obsd  =  current; 

idl  =  0; 

idr  =  1; 

end; 

else  if  t  ne  .  and  d  ne  .  then  do;  /*Type  1.  Both  time-to-onset  and 

time-to-death  are  known.*/ 

obstl  =  .; 

obstr  =  .; 

obst  =  t; 

itl=0; 

itr=0; 

obsdl  =  .; 

obsdr  =  .; 

obsd  =  d; 

idl  =  0; 

idr  =  0; 
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end; 

run; 

The  next  data  step  divides  the  data  set  into  exposed  and  controls.  This 
file  is  named  split. sas  in  the  example  folder. 


Data  mm.exposed  mm.control 
set  mm. cancer; 

if  id  =  ’E’  then  output  mm.exposed; 

else  output  mm.control; 

run; 

The  next  step  is  to  create  the  original  formats.  In  this  example  we  use 
five  year  intervals.  Because  there  were  very  few  events  in  the  first  ten  years, 
the  first  interval  is  zero  to  fifteen.  The  SAS^^  code  below  sets  up  the  two 
formats  for  time-to-onset  and  time-to-death  as  well  as  the  endpoints  needed 
as  input  to  the  mm2  macro.  This  code  is  in  YrSfmt.sas  in  the  example  folder. 


libname  library  'c  :  \windows\de$ktop\rnortrnorb'-, 
proc  format  library  =  library; 
value  dis  .  =  ’death  wo  disease’ 

0-15  =  ’[0,15)’ 

15-20  =  ’[15,20)’ 

20-25  =  ’[20,25)’ 

25-30  =  ’[25,30)’ 

30-35  =  ’[30,35)’ 

35-high  =  ’[35,+)’; 
run; 

proc  format  library  =  library; 
value  deth  0-  15  =  ’[0,15)’ 

15-  20  =  ’[15,20)’ 

20-  25  =  ’[20,25)’ 

25  -30  =  ’[25,30)’ 

30-  35  =  ’[30,35)’ 

35-  high  =  ’[35,+)’; 
run; 


data  times; 
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input  tl-t5; 
cards; 

15  20  25  30  35 
run; 

We  are  now  ready  to  run  nim2  on  the  data.  The  following  code,  called 
Fixboth.sas  in  the  example  folder,  runs  mm2  on  the  exposed  data  using  the 
the  formats  and  times  just  created.  The  macro  mm2  then  puts  out  the  set  of 
times  with  four  formats  for  the  time-to-disease  and  time-to-death  intervals. 
These  possibly  new  times  and  formats  rdis  and  rdeth  are  put  into  mm2  with 
the  control  data.  The  log  does  not  indicate  that  the  times  needed  to  be 
changed  on  this  second  run,  so  that  we  know  the  four  formats  cdis,  cdeth, 
crow  and  ccolumn  are  the  same  as  the  corresponding  ones  that  apply  to  the 
exposed  cohort. 


/***Fixboth.sas*****/ 
options  ls=78  nonotes; 

libname  mm  'c  :  \windaws\desktop\rnortrnorb'-, 

%include  'c  :  \windows\desktop\mm2.sa$'; 

%mm2(mm.exposed,mm.rprobs,mm.rpseu,dis,deth,times,. 0001, 100, 
c  :  \windows\desktop\mortmorb,mm.nnfoim,r)', 

%mm2(mm. control, mm. cprobs,mm.cpseu,rdis,rdeth,times,  .0001,100, 
c  :  \windows\desktop\Tnortmorb^mm.cmfoTm  ,c); 

We  now  have  estimates  of  the  joint  probabilities  for  a  set  of  time-to- 
onset  of  disease  and  time-to-death  variables  and  their  information  matrices 
for  both  groups.  The  data  sets  mm.rinform  and  mm.cinform  contain  both 
the  information  matrices  and  the  vector  of  the  first  K— 1  out  of  K  (in  our 
case,  13  out  of  14)  final  joint  probability  estimates.  In  addition,  these  inform 
data  sets  contain  arrays  named  Ni  and  Nj  of  length  K— 1  each,  holding  the 
original  coordinates  of  the  probabilities.  This  information  is  all  that  is  needed 
to  compare  the  joint  probability  distributions.  In  order  to  compute  and 
compare  functions  of  the  joint  probabilities  we  need  to  complete  the  vector 
of  joint  probabilities  and  compute  the  covariance  matrices  for  the  full  set 
of  probabilities.  We  will  then  be  ready  to  use  PROC  IML  of  SAS^'^  to 
compute  functions  of  the  joint  probabilities  and  the  covariance  structures  of 
these  functions. 

The  following  SAS^^  code  receives  the  output  of  the  two  mm2  runs 
and  prepares  it  for  use  with  PROC  IML.  This  code  is  in  the  file  named 
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Prepjont.sas  in  the  example  folder.  This  code  also  prints  the  tables  of  joint 
probabilities  and  their  standard  deviations  for  the  exposed  and  the  control 
groups.  It  also  prints  a  side-by-side  table  of  the  joint  probabilities  with  a 
title  that  gives  the  results  of  the  chi-square  test  of  equality  for  the  two  dis¬ 
tributions.  The  data  sets  rfilld  and  child  contain  the  vector  of  probability 
estimates  and  the  complete  covariance  matrix  and  the  coordinates  for  the 
original  rectangle  for  each  estimate. 


/***prepjont.sas**/ 

%include  'c  :  \'windows\desktop\Tnortmorb\matrx.sas'-, 

%include  'c  :  \windows\desktop\mortmarb\estcov.sas'', 

%include  'c  :  \windows\desktop\mortmarb\fillout.sas'\ 

%include  'c  ;  \windows\desktop\mortmorb\quaddiff.sas'\ 

%matrx  (mm .  rinform,  rmats) ; 

%matrx(mm.cinform,cmats);  /**Puts  information  matrices  into  matrix  in¬ 
form.*/ 

%estcov(rmats,rcov) ; 

%estcov(cmats,ccov);/*creates  covariance  matrix  from  information.*/ 
%pstdtab(rcov,rrow,rcolumn,10,8.6,Exposed  Joint  Probabilities); 
%pstdtab(ccov, crow, ccolumn, 10, 8.6, Control  Joint  Probabilities); 

/* Joint  probabilities  and  their  standard  deviations 
are  tabled  in  these  last  two  calls.**/ 

%£llcov(rcov,rhlld) ; 

%£llcov(ccov,cHlld);  /*puts  out  the  complete  covariance  matrix 
for  all  probabilities.*/ 

%quaddiff(r£lld,c£lld,rcolumn,jointprobs,Exposed, Controls); 

/*Compares  the  joint  probability  distributions*/ 


We  now  have  two  data  sets,  rhUd  and  child,  containing  the  joint  probabil¬ 
ity  distributions  and  their  covariance  martrices  for  the  exposed  and  control 
groups  on  the  same  set  of  rectangles.  These  data  sets  are  in  a  form  that 
allows  PROC  IML  to  be  used  on  them  directly.  We  can  now  use  these  data 
sets  to  compute  a  maximum  likelihood  estimate  of  any  function  of  the  the 
joint  distributions  as  well  as  the  covariance  structure  of  the  function.  As  two 
examples  of  doing  this  we  compute  relative  risk  and  lethality. 

The  following  code  computes  the  relative  risk  of  death  with  cancer  for 
the  exposed  versus  the  controls.  The  results  are  displayed  and  discussed  in 
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Mihalko  and  Michalek  (1998).  The  purpose  here  is  to  show  how  some  of  the 
macros  in  the  mortmorb  folder  are  used.  We  compute  the  relative  risk  of  the 
a  subject’s  morbidity/mortality  experience  being  in  a  particular  rectangle, 
the  relative  risk  of  dying  with  disease  in  a  particular  time-to-death  interval, 
and  the  overall  relative  risk  of  dying  with  disease. 

The  code  below  is  in  the  file  named  allRR.sas  in  the  example  file.  This  file 
is  a  driver  which  calls  up  three  other  macros  listed  in  the  include  statement. 
Each  of  these  macros  calls  other  macros  that  do  the  following.  First,  the 
macro  determines  which  joint  probabilities  are  needed.  It  then  calls  a  macro 
which  extracts  these  probabilities  and  the  corresponding  covariance  matrix 
from  the  input  file  rfilld  or  cfilld.  Other  macros  compute  sums  and  ratios 
of  the  probabilities.  Both  relative  risk  and  lethality  are  ratios  of  sums  of 
probabilities  from  the  joint  distributions.  For  each  function,  a  special  macro 
is  needed  to  apply  the  delta  method  for  finding  the  covariance  structure  of 
the  function  being  computed.  These  macros  also  call  up  the  tabling  macro, 
which  tables  the  estimates  and  their  standard  deviations  in  terms  of  the  orig¬ 
inal  rectangles  or  columns,  whichever  is  appropriate. 


libname  library  'c  :  \windows\deskt<yp\moTtrnc)rh'] 
%include  'c  :  \windows\desktop\mortmorb\relrskij.sas'; 
%include  'c  :  \windows\desktop\mortmorb\colrsk.sas'; 
%include  'c  :  \windows\desktop\mortmorb\totrlrsk.sas'-, 


options  nonotes; 

%relrisk(rfilld,cfilld, Relative  Exposed  to  Control); 

%let  t  =  Relative  Risk  of  dying  with  disease  Exposed  to  Control; 

%colrsk  (rfilld ,  cfilld  ,r  column  ,final ,  &:t ) ; 

%totrlrsk(rfilld,cfilld,final.  Total  Relative  Risk  of  Dying  with  disease); 

Lethality  by  column  is  tabled  for  both  groups.  The  heading  of  the  table 
gives  the  results  of  a  chi-square  test  for  the  equality  of  the  vectors  of  lethalities 
between  the  two  groups. 


References: 
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Appendix  2:  SAS^Macros  for  Non-par ametric  Es¬ 
timation  of  the  Joint  Probability  Distribution  of  Time- 
to-Onset  of  Disease  and  Time-to-Death 

SSiS*******/ 

/**:(c*This  code  uses  output  from  prepjont  to  get  all  three  versions**** 
*****of  relative  risk.  **/ 

libname  library  ’ c : \windows\desktop\mortmorb ’ ; /*  where  formats  eire  stored*/ 
‘/.include  '  c :  \windows\desktop\mortmorb\relrski  j  .  sas  ’ ; 

‘/.include  ’  c :  \windows\desktop\mortmorb\colrsk .  sas  ’ ; 

‘/.include  ’  c :  \windows\desktop\mortmorb\totrlrsk.  sas  ’ ; 
options  nonotes; 

‘/.relrisk(rfilld,cfilld, Relative  Risk  Exposed  to  Controls); 

‘/.let  t  =  Relative  Risk  of  dying  with  disease  Exposed  to  Controls; 

‘/.colrskCrf  illd. cfilld.rcoltimn, final, fct)  ; 

‘/,totrlrsk(rfilld,cfilld, final.  Total  Relative  Risk  of  Dying  with  disease); 
/***Cancer . sas****/ 

libname  mm  ’ c : \windows\desktop\mortmorb ’ ; 
data  mm. cancer; 

infile  ’ c : \windows\desktop\mortmorb\cancer . dat ’ ; 
input 


Casenr 

$1-5 

Id 

$7 

/*  E=Exposed,  C=Controls  */ 

Race 

$9-10 

/*NB=Non  Black  BL=Black*/ 

Occup 

12 

/*l,2=Flying  Officers  3=Non-Flying  Officers 
4=Flying  Enlisted  5=Non-Flying  Enlisted  */ 

Yob 

14-15 

Mob 

17-19 

Yotourl 

21-22 

Motourl 

24-25 

Mortstat 

$27 

/*D=Death*/ 

ica 

$28 

/*  Legend  for  Cause  of  Death*/ 

Yod 

30-31 

Mod 

33-34 

Cancer 

36 

Yonset 

38-39 

Pptr 

41-47 

/*  Format  7.3*/ 

Body Fat 

49-54 

/*  Format  6.3*/ 

run; 

data  mm. cancer; 
set  mm. cancer; 
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if  yotourl  >  yonset  and  yonset  ne  .  then  delete; 
if  yonset  >  yod  and  yod  ne  .  then  delete; 
if  yonset  =  .  then  t  =  . ; 

else  t  =  yonset  -  yotourl ;  /♦time  to  disease  from  tourl . */ 

if  yod  =  .  then  d  =  . ; 

else  d  =  yod  -  yotourl;  Z+time  to  death  from  tourl.*/ 

current  =  97  -  yotourl;  /*time  from  tourl  to  present*/ 

if  t  =  .  and  cancer  =  .  and  d  =  .  then  do;  /*Type  23.  No  information 

♦  ♦♦about  cancer  status  euid 
♦♦♦death  is  right  censored.*/ 

obstl  =  . ; 
obstr  =  . ; 
obst  =  0 ; 
itl=0; 
itr=l ; 
obsdl  =  . ; 
obsdr  =  . ; 
obsd  =  current ; 
idl  =  0; 
idr  =  1 ; 

end; 

else  if  t  =  .  and  cancer  =  0  and  d  =  .  then  do;  /*Type  19.  Cancer  is 

♦♦♦known  not  to  have 
♦♦♦occurred  by  the  time 
♦♦♦death  was  right 
♦♦♦censored. */ 

obstl  =  . ; 
obstr  =  . ; 
obst  =  . ; 
itl=0; 
itr=l ; 
obsdl  =  . ; 
obsdr  =  . ; 
obsd  =  current ; 
idl  =  0; 
idr  =  1; 

end; 

else  if  t  ne  .  and  d  =  .  then  do;  /♦Type  5.  Time-to-onset  is  known 

♦♦♦and  death  time  right  censored.*/ 

obstl  =  . ; 
obstr  =  . ; 
obst  =  t ; 


39 


itl=0; 

itr=0; 

obsdl  =  .  ; 

obsdr  =  . ; 

obsd  =  current ; 

idl  =  0; 

idr  =  1; 

end; 

else  if  t  =  .  and  cancer  =  .  and  d  ne  .  then  do;  /*Type  21.  No  information 

*=i=*about  disease  and 
death  time  is  known.  */ 

obstl  =  . ; 
obstr  =  . ; 
obst  =  0; 
itl=0; 
itr=l ; 
obsdl  =  . ; 
obsdr  =  . ; 
obsd  =  d; 
idl  =  0; 
idr  =  0; 

end; 

else  if  t  =  .  and  cancer  =  0  and  d  ne  .  then  do;  /*Type  17.  Cancer  is 

***known  not  to  have 
***occurred  before  death 
and  death  time  known.*/ 


obstl  =  . 
obstr  =  . 
obst  =  . 


itl=0; 


itr=l ; 


obsdl  =  . 
obsdr  =  . 
obsd  =  d 


idl  =  0; 
idr  =  0; 


end; 

else  if  t  =  .  and  cancer  =  1  and  d  =  .  then  do;  /*Type  8.  Cancer  is  known 

***to  have  occurred  before 


***and  death  time  is  right 
***censored. */ 
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obstl  =  .  ; 

obstr  =  .  ; 

obst  =  current ; 

itl=l; 

itr=0; 

obsdl  =  . ; 

obsdr  =  . ; 

obsd  =  current ; 

idl  =  0; 

idr  =  1; 

end; 

else  if  t  ne  .  and  d  ne  .  then  do;  /*Type  1.  Both  time-to-onset  and 

***time-to-death  are  known.*/ 

obstl  =  . ; 
obstr  =  . ; 
obst  =  t; 
itl=0; 
itr=0; 
obsdl  =  . ; 
obsdr  =  . ; 
obsd  =  d; 
idl  =  0; 
idr  =  0; 

end; 

run; 

/*****cellsgn2 . sas  3/30/98**/ 

‘/oinacro  cellsgn(infile, datafile, clasfile,nuinrow,niimcol,dwod) ; 

/5(:s|!***:(<**:(c3(t*********  +  *************j|t****j)t***j)c*j|t******jtc****!(c3|c!)c;)c*!(t***:)!  +  * 

***Infile  contains  the  times  that  make  up  the  disease  and  death  *** 
=K**time  lattice.  *** 

***datafile  is  the  file  containing  the  mortality  morbidity  data.  *** 
***clasfile  will  be  the  datafile  augmented  with  obst,  obstl  and  *** 
!)!**obstr  classified  by  row  aind  obsd,  obsdl  and  sbsdr  classified  by*** 
***column.  numrow  and  niimcol  are  the  numbers  of  row  and  columns,  *** 
♦♦^respectively .  dwod  is  1  if  there  is  a  first  row  of  missing  *** 
♦♦♦disease  times  and  0  otherwise.  *** 

****:((:(c*:(!*^  ♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦  ♦♦♦♦♦♦♦♦♦♦♦/ 

‘/olet  numt  =  ‘/.evaK&numrow  -l-&dwod)  ; 

'/olet  numd  =  y,eval(&numcol  -  1); 
data  null; 
set  feinfile; 

‘/odo  i  =  1  7. to  fenumt; 
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call  symput ("t&i" .trim (left (t&i))) ; 

°/oend ; 
stop; 
run; 

data  null; 

set  feinfile; 

y.do  i  =  1  %to  fenumd; 

call  symput("d&i" ,trim(left(t&i))) ; 

‘/end; 

stop; 

run; 

/**The  next  data  step  classifies  the  disease  and  death  times  into*** 
♦♦♦respective  column  and  row  intervals.  Originally,  pairs  with  *** 
♦♦♦death  preceding  disease  axe  assigned  to  row  =0.  At  the  end  ♦** 
♦♦♦of  the  data  step,  if  there  are  such  observations,  i.e.  dwod=l  *** 
***then  dwod  (which  will  be  1  in  that  case)  is  added  to  each  row  ♦♦* 
♦♦♦number.  **/ 

data  feclasfile; 
set  &datafile; 

if  obsd  <=  &dl  then  column  =  1; 

if  obsdl  <=  &dl  then  columnl  =  1; 

if  obsdr  <=  Ml  then  columnr  =  1; 

y.do  i  =  2  ‘/.to  &numd  ; 

y.let  ii  =  y.eval(&i  -  1); 
if  &&d&ii<  obsd  <=  &M&i  then  column  =  &i; 
if  &M&ii<  obsdl  <=  &&d&i  then  columnl  =  &i; 

if  &M&ii<  obsdr  <=  &M&i  then  columnr  =  &i; 

‘/.end; 

if  obsd  >  &M&numd  then  column  =  Mumcol; 
if  obsdl  >  &&d&numd  then  columnl  =  Mumcol; 
if  obsdr  >  &&dMumd  then  col\imnr  =  Mumcol; 
if  obstr  =  0  then  do; 

row  =  1;  /**this  is  for  totally  unknown  disease  info.*/ 

rowr  =  0;  /*which  are  types  21  through  24.  */ 

rowl  =0; 

end; 

else  do; 

if  obst  =  .  then  row  =  0; 

else  if  obst  <=  &tl  then  row  =  1; 

if  obstl  =  .  then  rowl  =  0; 

else  if  obstl  <=  &tl  then  rowl  =  1; 

if  obstr  =  .  then  rowr  =  0; 
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else  if  obstr  <=  &tl  then  rowr  =  1; 
end; 

‘/odo  i  =  2  %to  fenumt; 

7olet  ii  =  ’/,eval(&i  -  1); 
if  &&t&ii  <  obst  <=  fefet&i  then  row  =  &i; 
if  &&t&ii  <  obstl  <=  &&t&i  then  rowl  =  &i; 

if  &&t&ii  <  obstr  <=  &&t&i  then  rowr  =  &i; 

‘/.end; 

if  obst  >  &&t&numt  then  row  =  ‘/.eval  (&nuint+l)  ; 
if  obstl  >  &&t&numt  then  rowl  =  */.eval(&nuint+l)  ; 

if  obstr  >  &&t&nuint  then  rowr  =  ‘/.evaK&nnmt+l)  ; 

row  =  row  +  &dwod; 
rowl  =  rowl  +  fedwod; 
rowr  =  rowr  +&dwod; 
run; 

‘/.mend; 

/*cellsgn (inf ile, datafile ,clasfile,numrow,numcol)*/ 
/**numrows  and  numcols  come  from  intprob2**/ 

/ *‘/.cellsgn(times .mortmorb , outmm,&numrows , fenumcols)  ;  */ 
/!t!**classif y .  sas  4/21/98**/ 

***In  order  to  implement  the  em  algorithm  using  *** 

***the  initial  probabilities  produced  by  test.sas  *** 


***we  need  to  classify  each  observation  in  the  *** 
*=i<*original  data  set  by  censoring  type.  This  *** 
**=t!classif ication  will  involve  a  variable  which  *** 
*=i==itgives  the  type  nvimber  and  the  interval  values  *** 
!)t=t:=t!for  the  censored  disease  and  death  times.  *** 
***We  will  also  append  the  array  of  variables  *** 
*=it*to  each  observation  to  hold  the  pseudo  data  *** 
***contributions  of  each  observation  to  each  *** 
***interval.  *** 
***Also  appended  to  each  observation  will  be  the  *** 


***current  set  of  joint  probabilities  for  disease  *** 


***and  death.  The  censoring  type  will  eind  the  *** 
***censored  values  will  determine  which  *** 
***probabilities  get  put  into  the  pseudo-data  *** 
***cells.  *** 


‘/.macro  classify(infile,outfile,dwod) ; 
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***Infile  contains  mortality/morbidity  data  of  the*** 


***fonn  built  by  whodis.  Outfile  will  contain  the  *** 
***same  data  with  another  variable  called  type,  *** 
***which  will  indicate  censoring  type.  *** 
***The  following  table  defines  the  censoring  types*** 
***that  this  macro  will  handle.  *** 
*+*dwod  indicates  by  a  1  if  there  are  cases  of  *** 


***known  deaths  without  disease  and  is  0  otherwise*** 

/********Attempt  to  classify  mortality  morbidity  data  by  ******** 
*********data  type.  */ 

‘/oif  ‘/oeval  (fedwod)  =  1  “/.then  */odo; 
data  dandd  justdeth; 
set  feinfile; 

if  (obst=.  and  obstl=.)  /*Death  info  known  to  be  before  disease.*/ 
or  (itL=  0  and  itR  =  1  and  obst  =  0  )  /*no  disease  info*/ 
then  output  justdeth; 
else  output  dandd; 
run; 

‘/end; 

‘/else  ‘/do; 
data  dandd; 
set  feinfile; 
run; 

‘/end; 

/***dandd  are  the  observations  who  were  known  to  have  the 
****disease  before  death.  Justdeth  speeiks  for  itself.*/ 

/*Now  we  use  the  indicators  to  classify  each  data  type*/ 
data  dandd; 
set  dandd; 

if  itl=0  and  itr=0  and  idl=0  and  idr  =  0  then  type  =  1; 
else  if  itl=l  and  itr=0  and  idl=0  and  idr  =  0  then  type  =  2; 

else  if  itl=0  and  itr=l  and  idl=0  and  idr  =  0  then  type  =  3; 

else  if  itl=0  and  itr=0  and  idl=l  and  idr  =  0  then  type  =  4; 

else  if  itl=0  and  itr=0  and  idl=0  and  idr  =  1  then  type  =  5; 

else  if  itl=l  and  itr=l  and  idl=0  and  idr  =  0  then  type  =  6; 

else  if  itl=l  and  itr=0  and  idl=l  and  idr  =  0  then  type  =  7 ; 

else  if  itl=l  and  itr=0  and  idl=0  and  idr  =  1  then  type  =  8; 

else  if  itl=0  and  itr=l  and  idl=l  and  idr  =  0  then  type  =  9; 
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else  if  itl=0  and  itr=l  and  idl=0  and  idr  =  1  then  type  =  10 

else  if  it 1=0  and  itr=0  and  idl=l  and  idr  =  1  then  type  =  11 

else  if  itl=l  and  itr=l  and  idl=l  and  idr  =  0  then  type  =  12 

else  if  itl=l  and  itr=l  and  idl=0  and  idr  =  1  then  type  =  13 

else  if  itl=l  and  itr=0  and  idl=l  and  idr  =  1  then  type  =  14 

else  if  itl=0  and  itr=l  and  idl=l  and  idr  =  1  then  type  =  15 

else  if  itl=l  and  itr=l  and  idl=l  and  idr  =  1  then  t3rpe  =  16 

rim; 

'/oif  y.eval  (&dwod)  =  1  '/.then  */,do; 
data  justdeth; 
set  justdeth; 
if  obst  =  .  then  do; 

if  idl=0  and  idr=0  then  type=17; 
else  if  idl=l  and  idr=0  then  type=18; 
else  if  idl=0  and  idr=l  then  type=19; 
else  if  idl=l  and  idr=l  then  type=20; 

end; 

else  if  obst  =  0  then  do; 

if  idl=0  and  idr=0  then  type=21; 
else  if  idl=l  and  idr=0  then  type=22; 
else  if  idl=0  and  idr=l  then  t3rpe=23; 
else  if  idl=l  and  idr=l  then  tjrpe=24; 

end; 

run; 

data  feoutfile; 

set  dandd  justdeth; 

rim; 

7oend; 

’/.else  ’/.do; 
data  feoutfile; 
set  dandd; 
run; 

‘/.end; 

‘/.mend ; 

/^classify (inf ile , outf ile ,dwod) ; ♦/ 

/*y,classify(outmm,outimn,l)  ;♦/ 

/***classify . sas  4/21/98**/ 

/♦♦*=l=*=t:=(:=('=l=****=)e***  +  ***********=|t*5|t*  +  ******^**  +  *:)c:t!!)cj)c%s|ts|c 

***In  order  to  implement  the  em  algorithm  using  *** 

***the  initial  probabilities  produced  by  test.sas  *** 

***we  need  to  classify  each  observation  in  the  *** 
*:**original  data  set  by  censoring  type.  This  *** 
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***classif ication  will  involve  a  variable  which  *** 
**=i=gives  the  tjrpe  number  and  the  interval  values  *** 


***for  the  censored  disease  cind  death  times.  *** 

**=i=We  will  also  append  the  array  of  variables 
**=t:to  each  observation  to  hold  the  pseudo  data  *** 
***contributions  of  each  observation  to  each  *** 

=i=**interval .  *** 


***Also  appended  to  each  observation  will  be  the  *** 
=t:**current  set  of  joint  probabilities  for  disease  *** 


=t=**and  death.  The  censoring  type  will  and  the  *** 
***censored  values  will  determine  which  *** 
♦^^probabilities  get  put  into  the  pseudo-data  **♦ 
♦♦♦cells.  ♦♦♦ 


Vomacro  classifyCinf ile,outf ile,dwod)  ; 

♦♦♦Infile  contains  mortality/morbidity  data  of  the^^^ 


♦♦♦form  built  by  whodis.  Outfile  will  contain  the  ♦♦♦ 
♦♦♦same  data  with  another  variable  called  type,  ♦♦♦ 
♦♦♦which  will  indicate  censoring  type.  ♦♦♦ 
♦♦♦The  following  table  defines  the  censoring  types^^^ 
♦♦♦that  this  macro  will  handle.  ♦♦♦ 
♦♦♦dwod  indicates  by  a  1  if  there  are  cases  of  ♦♦♦ 


♦♦♦known  deaths  without  disease  and  is  0  otherwise^^^ 

/♦♦♦♦♦♦♦♦Attempt  to  classify  mortality  morbidity  data  by  ♦♦♦♦♦♦♦♦ 
type.  ♦/ 

”/oif  y.eval (fedwod)  =  1  ’/.then  ’/.do; 
data  dandd  justdeth; 
set  feinfile; 

if  (obst=.  and  obstl=.)  /♦Death  info  known  to  be  before  disease. ♦/ 
or  (itL=  0  and  itR  =  1  and  obst  =  0  )  /♦no  disease  info*/ 
then  output  justdeth; 
else  output  dandd; 
run; 

’/.end; 

’/.else  ’/.do; 
data  dandd; 
set  feinfile; 
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run; 

'/oend ; 

/=i=*=i=dandd  are  the  observations  who  were  known  to  have  the 
****disease  before  death.  Justdeth  speaks  for  itself.*/ 
/*Now  we  use  the  indicators  to  classify  each  data  type*/ 
data  dandd; 
set  dandd; 

if  itl=0  and  itr=0  and  idl=0  and  idr  =  0  then  type  =  1; 
else  if  itl=l  and  itr=0  and  idl=0  and  idr  =  0  then  type  =  2; 

else  if  itl=0  and  itr=l  and  idl=0  and  idr  =  0  then  t3rpe  =  3; 

else  if  it  1=0  aind  itr=0  and  idl=l  and  idr  =  0  then  type  =  4; 

else  if  itl=0  and  itr=0  and  idl=0  and  idr  =  1  then  type  =  5; 

else  if  itl=l  and  itr=l  and  idl=0  and  idr  =  0  then  type  =  6; 

else  if  itl=l  and  itr=0  and  idl=l  and  idr  =  0  then  type  =  7; 

else  if  itl=l  and  itr=0  and  idl=0  and  idr  =  1  then  type  =  8; 

else  if  itl=0  and  itr=l  and  idl=l  and  idr  =  0  then  type  =  9; 

else  if  itl=0  and  itr=l  and  idl=0  and  idr  =  1  then  type  =  10 

else  if  it 1=0  and  itr=0  and  idl=l  and  idr  =  1  then  type  =  11 

else  if  itl=l  and  itr=l  and  idl=l  and  idr  =  0  then  type  =  12 

else  if  itl=l  and  itr=l  and  idl=0  and  idr  =  1  then  type  =  13 

else  if  itl=l  and  itr=0  and  idl=l  and  idr  =  1  then  type  =  14 

else  if  itl=0  cuid  itr=l  and  idl=l  and  idr  =  1  then  type  =  15 

else  if  itl=l  and  itr=l  and  idl=l  and  idr  =  1  then  type  =  16 

rim; 

y.if  7,eval(&dwod)  =  1  ’/then  ’/do; 
data  justdeth; 
set  justdeth; 
if  obst  =  .  then  do; 

if  idl=0  and  idr=0  then  type=17 ; 
else  if  idl=l  and  idr=0  then  type=18; 
else  if  idl=0  and  idr=l  then  type=19; 
else  if  idl=l  and  idr=l  then  t3rpe=20; 

end; 

else  if  obst  =  0  then  do; 

if  idl=0  and  idr=0  then  type=21; 
else  if  idl=l  and  idr=0  then  type=22; 
else  if  idl=0  and  idr=l  then  type=23; 
else  if  idl=l  and  idr=l  then  type=24; 

end; 

run; 

data  feoutfile; 
set  dandd  justdeth; 


run; 

%end ; 

“/.else  '/odo; 
data  feoutfile; 
set  dandd; 
run; 

°/oend; 

’/omend ; 

/*classif y (inf ile , outf ile ,dwod) ; */ 
/*yoClassify(outnffli,outmm,l)  ;*/ 

%macro  colrsk (numcovp , dencovp , coif mt , outf ile , t it le) ; 

5K*sicTiiis  macro  takes  two  matrices  of  estimated  joint  *** 
^♦♦probabilities  and  produced  a  table  of  relative  ♦*♦ 
♦♦♦risks  with  their  standard  deviations .  ♦*♦ 

♦♦♦numcovp  is  a  SAS  data  set  for  the  numerator  of  the*** 
**^relative  risk,  dencovp  is  a  SAS  data  set  for  the  *** 
***denominator  of  the  relative  risks.  Both  SAS  data  *** 
***sets  must  have  the  same  number  or  rows  and  columns*** 
♦♦♦corresponding  to  the  same  set  of  rectangles  in  a  ♦♦* 
***two  dimensional  matrix.  The  columns  for  the  data  *** 
♦*^sets  must  be  variables  named  covl-covK,p,  row  and  *** 
♦♦♦col-umn.  K  is  the  n\amber  of  joint  non-zero  joint  *** 
♦♦♦probabilities  for  the  rectangles  and  also  the  *** 
♦♦♦number  of  rows.  Covl-covK  are  the  columns  of  the  ♦♦♦ 
♦♦♦covariance  matrix  for  the  estimates  of  the  K  joint*** 
♦♦♦probabilities  contained  in  variable  p.  Row  and  *** 
♦♦♦coliimn  are  the  row  and  column  indices,  indicating  *** 
***the  coordinates  of  the  rectangle  for  p  is  the  *** 
***estimate  of  the  joint  probability.  The  row  and  *** 
♦♦♦column  values  for  both  data  sets  must  be  the  same  *** 
*^*and  in  the  same  order.  Colfmt  is  the  format  that  *** 
♦♦♦gives  the  column  intervals  defined  by  the  column  *** 


♦♦♦values .  *** 
***A  SAS  data  set  given  the  name  in  the  argument  *** 
♦♦♦outf ile  will  contain  number  of  observations  equal  *** 
♦♦♦to  the  number  of  columns  with  variables  relrisk,  *** 
***stdev  and  column.  ♦♦♦ 
***Title  will  be  the  title  of  the  printed  output .  *** 


/**WARNING! ! !  WARNING! I  I 


WARNING  I ! I  WARNING !!!♦♦♦ 
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***  Be  sure  that  the  follwoing  include  statements  *** 

***have  the  correct  addresses .  *** 

***WARNING ! ! !  WARNING ! ! !  WARNING ! ! !  WARNING !!!*♦/ 

'/.include  ’  c :  \windows\desktop\mortmorb\submtrx .  sas  ’ ; 

’/.include  ’  c :  \windows\desktop\mortmorb\sumstd .  sas  ’ ; 

’/.include  ’  c :  \windows\desktop\mortmorb\ratiostd .  sas  ’  ; 

/***The  following  code  gets  the  dimension  of  the  pvector  for  the  two*** 
****groups  and  tests  to  see  if  they  are  equal.  *** 

.!(=(!****♦*****♦*!(:***♦***!(!  **********************************  **********/ 

data  _null_; 

set  fenximcovp  nobs=n  end  =  last; 
call  symput ( ’ rdim ’ , compress (n) ) ; 
if  last  then  do; 

call  symput (’colnum’ .compress (colvimn)) ;  /**number  of  columns  put  into 

macro  variable  colnum**/ 

call  symput ('rownum’ .compress (row)) ;  /**number  rows  in  rownum**/ 
end; 
run; 

data  _null_; 
set  fedencovp  nobs=n; 
if  n  =  &rdim  then  do; 
end  =  0; 

call  symput ( ’ end ’ , compress (end) ) ; 
stop; 

end; 

else  do; 

end  =  1 ; 

call  symput ( ’ end ’ . compress (end) ) ; 
stop; 

end; 

run; 

’/.if  &end  =  1  ’/.then  ’/.do; 

’/.put  The  dimensions  of  the  two  probability  vectors  are  imequal.; 

’/.put  We  cannot  compute  the  requested  table . ; 

’/.end; 

’/.else  ’/.do; 

data  covnum;  is  the  prefix  for  the  numerator.**/ 

set  fenumcovp; 
array  cov{&rdim}; 
array  rcov{&rdim}; 

’/.do  i  =  1  ’/.to  ferdim; 

rcov(&i)  =  cov(&i); 
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%end; 


rrow  =  row; 
rcolumn  =  column; 

run; 

data  covden;  is  the  prefix  for  the  denominator.**/ 

set  fedencovp; 
array  cov{&rdim}; 
array  ccov{&rdim}; 

"/odd  i  =  1  '/oto  ferdim; 

ccov(&i)  =  cov(&i) ; 

y.end; 

crow  =  row; 
c column  =  column; 

run; 

data  bothcovp; 
merge  covnum  covden; 
run; 

/***testing  to  be  sure  the  row  and  column  indices  match.***/ 
data  _null_; 
set  bothcovp  ; 

if  rrow  ne  crow  or  rcolvimn  ne  ccolumn  then  do; 
end  =1; 

call  S3nnput  ( ’  end  ’ ,  compress  (end)  )  ; 
stop; 

end; 

run; 

y.if  &end  =  1  ’/then  ’/do; 

'/put  The  row  and/or  coliimn  indices  for  the  two  vectors; 

’/put  Do  not  match.  ; 

’/put  We  must  stop  here .  ; 

’/end; 

’/else  ’/do; 

data  covnum; 
set  covnum; 
row=rrow ; 
column  =  rcolumn; 

drop  rrow  rcolumn  rcovl-rcov&rdim  ; 
run; 

data  covden; 
set  covden; 
row=crow; 
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column  =  ccolumn; 

drop  crow  ccolumn  ccovl-ccov&rdim; 
rvin; 

%do  j  =  1  %to  fecolnum; 

/****Now  we  pull  out  the  probabilities  and  covariance**** 
*****submatrix  needed  to  compute  the  numerator  and  **** 
*****denominator  for  the  relative  risk  of  dying  with  **** 
*****disease  in  column  j .  It  is  assumed  that  row  1  **** 

*****death  without  disease.  So  that  the  probability  **** 
*****of  dying  with  disease  in  column  j  is  the  sum  of  **** 
*****the  joint  probabilties  in  column  j  from  row  2  to**** 
*****the  min  of  rownum  and  (j+1) .  **** 

data  _null_; 

maxrow  =  &j  +  1; 

last row  =  min (maxrow, ftrownum) ; 

call  symput ( ’ lastrow ’ , compress (lastrow) ) ; 

stop; 

run; 

y,submtrx(covnum,2,&j  ,&lastrow,&j  .numjcov) ; 
7osubmtrx(covden,2,&j  ,&lastrow,&j  ,denjcov) ; 
y.sximstdCnumjcoVjCov.n-umout)  ; 
y,svimstd(denjcov,cov,denout)  ; 

/**The  files  numout  and  denout  contain  one  observation  each*** 
***with  variables  sum  and  stdev,  which  are  the  sum  of  the  *** 
***probabilities  in  the  column  and  the  standard  deviation  *** 
***of  the  svtm,  respectively.  The  next  code  puts  these  *** 
***quantities  into  macro  variables  with  the  appropriate  *** 
***names.  *** 

data  _null_; 

set  numout; 

sig  =  stdev*stdev; 

call  symput (’numsig’ .compress (sig)) ; 
call  symput ('numsum' .compress (sum)) ; 
stop; 
run; 

data  _null_; 
set  denout; 
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sig  =  stdev*stdev; 

call  symput ( ’ densig ’ , compress (sig) ) ; 
call  symput  ( ’  densum  ’ ,  compress  (siun) ) ; 
stop; 
run; 

/♦♦♦The  next  call  to  ratiostd  computes  the  relative  risk  for  row 
♦♦♦♦along  with  its  standard  deviation.  *** 

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦  ♦♦♦♦♦♦♦♦♦♦♦♦/ 

'/orat iostd  (feninnsum ,  fedensum ,  fenumsig ,  fedensig ,  0 ,  ratout )  ; 

/♦♦♦Now  the  relative  risk  and  standard  deviation  will  be  put  into^^^ 
♦♦♦♦the  output  data  set.  *** 

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦  ♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦/ 

‘/.if  &j  =  1  ‘/.then  ‘/.do; 
data  feoutfile; 

label  relrisk  =  ’Relative  Risk’  stdev  =  ’Standard  Deviation’; 
set  ratout; 
relrisk  =  ratio; 
column  =  1; 

keep  column  relrisk  stdev; 
run; 

’/.end ; 

‘/.else  ’/.do; 

data  ratout; 
set  ratout; 
relrisk  =  ratio; 
coliimn  =  &j  ; 

keep  column  relrisk  stdev; 
run; 

data  feoutfile; 

label  relrisk  =  ’Relative  Risk’  stdev  =  ’Standard  Deviation’ 
column  =  ’Time -to -Death’ ; 
set  feoutfile  ratout; 
run; 

‘/.end ; 

’/.end; 

proc  tabulate  data  =  feoutfile  format=f 10.6; 
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format  column  fecolfmt..; 

class  column; 

var  relrisk  stdev; 

table  column,  sum* (relrisk  stdev); 

keylabel  sum  =  ’  ’ ; 

title  "fetitle"; 
run; 

7.  end; 

70  end ; 

7omend ; 

/*example*/ 

/*colrsk(numcovp , dencovp , coif mt , outfile .title) */ 

/ *******example*******/ 

/*libname  library  ’c:\windows\desktop\mortmorb’;  where  formats  are  stored*/ 
/*options  nonotes; 

7.1et  t  =  Relative  Risk  of  dying  with  disease  Exposed  to  Controls; 
7ocolrsk(rfilld,cfilld,rcolumn, final, &t)  ;  */ 

/**em.sas  3/30/98**/ 

/*Uses  the  output  of  cellsgn,  newtimesS  euid  the  array  numbering 
***system  of  sas  along  with  the  data  type  combined  with  &probfile 
***to  determine  how  to  allocate  each  observation  to  the  pseudo 
***data  array.**/ 

/********This  data  step  puts  the  cell  probabilities  into 
***each  observation  of  the  mortmorb  data  set  with  the 
***row  and  column  classification  of  fedatafile.  ♦****/ 

7omacro  emCdataf ile ,probfile ,emout, probout ,numrows ,numcols, total ,tol , its 

,dwod) ; 

/ :|t ** 3|c ******  ;<c ** *♦*;** :)c :(c * ******* j(t ^)c :(c *  +  ***  +  ****** s|c **** :(c * 3|c ^ :|c * :(c * 5|c * :t: 

***This  macro  performs  the  em  algorithm  on  the  mortality  morbidity  *** 
***data  contained  in  datafile.  Data  file  must  have  a  variable  named  *** 
***type  that  indicated  what  is  the  censoring  type  for  each  *** 

***observation.  Censoring  t5rpes  eire  assigned  according  to  the  macro  *** 
***classify  which  uses  the  24  censoring  types  from  Mihalko  &  Michalek*** 
***(1998).  Each  observation  must  also  be  classified  according  to  *** 
***the  row  and  column  of  the  cell  of  the  disease  by  death  matrix  *** 

***defined  by  the  matrix  of  probabilities  in  probfile.  Normally  *** 

***datafile  will  be  the  output  file  from  the  macro  classify.  *** 

***probfile  must  contain  fctotal  probabilities  than  sum  to  1 


*** 
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***cind  have  names  newpl  to  newp&total  as  SAS  numbers  numrows  by  *** 
***numcols  array,  tol  is  the  convergence  check.  *♦/ 

***The  pseudo  data  are  computed  by  s^lmming  the  probabilites  of  each  *** 
***cell  that  the  censoring  pattern  allows  as  a  possibility  cind  using  *** 
***the  sum  as  the  denominator  for  the  conditional  cell  probabilities  *** 


***computed  by  dividing  the  denominator  into  the  cell  probability  *** 
**!t!for  each  possible  cell.  The  initial  probabilities  are  those  in  *** 
***probfile.  Probfile  is  updated  after  all  the  observations  have  *** 
***been  distributed  according  to  their  possible  cell  memberships  by  *** 
=K**computing  new  cell  probabilities  using  the  sample  proportions  of  *** 
**!)=the  current  pseudodata.  *** 


♦  He*********  *♦*****♦♦*:♦!  **************!)=  ******=!=*♦♦!(=****♦♦=♦:♦*********♦***♦♦/ 

data  nextprob; 
set  feprobfile; 
run; 

‘/olet  ind  =  0; 

“/.let  i  =  1; 

“/.do  “/.while  (&ind  <  1  AND  &i  <=  &its  ); 
data  &emout ; 

merge  fedatafile  nextprob; 
array  newp{&numrows,&numcols}-; 
array  np{&numrows,&numcols}; 
if  _n_  =  1  then  do ; 

do  i  =  1  to  ftnumrows; 

do  j  =  1  to  fenumcols; 

np(i, j)  =  newpCi, j) ; 

end; 

end; 

end; 

retain  npl  -  np&total; 

drop  newpl  -  newp&total  i  j ; 

run; 

/**In  the  following  code  the  column- 1  and  row+1  are  due  to  the 
***fact  that  row  1  is  disease  missing  row.  So  that  row  i  is  the 
***same  as  column(i-l) . **/ 

data  &;emout ; 
set  feemout; 

array  np{&numrows,&numcols}; 

array  ps{&numrows,&numcols};  /♦*These  will  contain  the  pseudodata 
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contributions . */ 

denom  =  0; 

do  i  =  1  to  fenumrows; 

do  j  =  1  to  fenumcols; 

ps(i,j)  =  0; 
end; 
end; 

if  type  =  1  then  do; 

ps (row, column)  =  1; 

end; 

else  if  tjrpe  =  2  then  do; 

•  do  i  =  1  +  fedwod  to  row; 

denom  =  denom  +  np(i, column) ; 

end; 

do  i  =  1  +  fcdwod  to  row; 
if  denom  =  0  then  ps(i, column)  =  0; 
else  ps(i, column)  =  np(i, column) /denom; 
end; 

end; 

else  if  type  =  3  then  do; 

do  i  =  row  to  min(&numrows , column+  fcdwod) ; 
denom  =  denom  +  np(i,coliunn)  ; 

end; 

do  i  =  row  to  min(&numrows , column+  fedwod) ; 
if  denom  =  0  then  ps(i, column)  =  0; 
else 

ps(i,  column)  =  np(i,coliimn) /denom; 

end; 

end; 

else  if  type  =  4  then  do; 

do  j  =  row-  &dwod  to  minC&numrows-  &dwod , coliimn)  ; 
denom  =  denom  +  np (row , j ) ; 

end; 

do  j  =  row-  &dwod  to  min(&numrows-  fedwod, column) ; 
if  denom  =  0  then  ps(row,j)  =  0; 
else  ps(row,j)  =  np(row,j) /denom; 

end; 


end; 


else  if  type  =  5  then  do; 

do  j  =  maxCrow-  &dwod, column)  to  fenumcols; 

denom  =  denom  +  np(row,j); 
end; 

do  j  =  max (row-  fedwod, column)  to  fenumcols; 
if  denom  =  0  then  ps(row,j)=0; 
else  ps(row,j)  =  np (row, j) /denom; 
end; 

end; 

else  if  type  =  6  then  do; 

do  i  =  rowl  to  min(rowr,column+  fedwod) ; 

denom  =  denom  +  np(i, column) ; 
end; 

do  i  =  rowl  to  min(rowr,column+  &dwod) ; 
if  denom  =  0  then  ps(i, column)  =0; 
else  ps(i jColiomn)  =  np(i, column) /denom; 
end; 

end; 

else  if  type  =  7  then  do; 

do  i  =  1+  &dwod  to  row; 
do  j  =  i-  ftdwod  to  column; 
denom  =  denom  +  np ( i , j ) ; 

end; 

end; 

do  i  =1  +  &dwod  to  row; 

do  j  =  i-  fedwod  to  column; 

if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i, j)/denom; 
end; 
end; 

end; 

else  if  type  =  8  then  do; 

do  i  =  1  +  fedwod  to  min(row, column+  fedwod) ; 
do  j  =  max  (row-  fedwod,  column)  to  feniomcols; 

denom  =  denom  +  np(i,j); 
end; 
end; 

do  i  =  1+  fedwod  to  min(row, column+  &dwod) ; 
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do  j  =  max (row-  &dwod, column)  to  fenumcols; 

if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i,j) /denom; 
end; 
end; 

end; 

else  if  type  =  9  then  do; 

do  i  =  row  to  min(&numrows,column+  &dwod) ; 
do  j  =  i-  &dwod  to  column; 

denom  =  denom  +  np(i,j); 

end; 

end; 

do  i  =  row  to  min(&numrows,column+  fedwod) ; 
do  j  =  i-  fedwod  to  column; 
if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i,j) /denom; 
end; 

end; 

end; 

else  if  type  =  10  then  do; 

do  i  =  row  to  fenumrows; 

do  j=  max(i-  fedwod, column)  to  fenumcols; 

denom  =  denom  +  np(i,j); 
end; 
end; 

do  i  =  row  to  fen\imrows ; 
do  j=  maixCi-  fedwod, column)  to  fenumcols; 
if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i, j)/denom; 
end; 
end; 

end; 

else  if  type  =  11  then  do; 

do  j  =  columnl  to  columnr; 

denom  =  denom  +  np(row,j); 

end; 

do  j  =  columnl  to  columnr; 


if  denom  =  0  then  ps(row,j)  =  0; 
else  ps(row,j)  =  np (row, j) /denom; 
end; 

end; 

else  if  type  =  12  then  do; 

do  i  =  rowl  to  rowr; 

do  j  =  i-  fedwod  to  column; 

denom  =  denom  +  np(i,j); 
end; 
end; 

do  i  =  rowl  to  rowr; 

do  j  =  i-  fedwod  to  column; 

if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i, j)/denom; 
end; 
end; 

end; 

else  if  type  =  13  then  do; 

do  i  =  rowl  to  rowr; 

do  j  =  column  to  fenumcols; 

denom  =  denom  +  np(i,j); 
end; 
end; 

do  i  =  rowl  to  rowr; 

do  j  =  column  to  fenumcols; 

if  denom  =  0  then  ps(i,j)  =0; 
else  ps(i,j)  =  np(i, j)/denom; 
end; 
end; 

end; 

else  if  type  =  14  then  do; 

do  i  =  1+  fedwod  to  row; 

do  j  =  columnl  to  columnr; 

denom  =  denom  +  np(i,j); 
end; 
end; 

do  i  =  1+  &dwod  to  row; 

do  j  =  columnl  to  coliimnr; 

if  denom  =  0  then  ps(i,j)  =  0; 


else  ps(i,j)  =  np(i, j)/denom; 
end; 
end; 

end; 

else  if  type  =  15  then  do; 

do  i  =  row  to  minC&numrows, coluinnr+  fedwod)  ; 
do  j  =  max(i-  fedwodjcolumnl)  to  columnr; 

denom  =  denom  +  np(i,j); 
end; 

end; 

do  i  =  row  to  min(&n\imrows ,  coluinnr+  &dwod) ; 
do  j  =  maixCi-  &dwod,columnl)  to  columnr; 
if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i, j)/denom; 
end; 

end; 

end; 

else  if  type  =  16  then  do; 

do  i  =  rowl  to  rowr; 

do  j  =  columnl  to  columnr; 

denom  =  denom  +  np(i,j); 

end; 

end; 

do  i  =  rowl  to  rowr; 

do  j  =  columnl  to  columnr; 

if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i,j) /denom; 

end; 

end; 

end; 

else  if  type  =  17  then  ps(l, column)  =  1; 

else  if  type  =  18  then  do; 

do  j  =  1  to  column; 

denom  =  denom  +  np ( 1 , j ) ; 

end; 

do  j  =  1  to  col\imn; 
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if  denom  =  0  then  ps(l,j)  =  0; 
else  ps(l,j)  =  np(l,j) /denom; 
end; 

end; 

else  if  type  =  19  then  do; 

do  j  =  column  to  ftnumcols; 

denom  =  denom  +  np(l,j); 

end; 

do  j  =  column  to  ftnumcols; 

do  i  =  min(&numrows,column+l)  to  minC&numrows, j+1) ; 

denom  =  denom  +  np(i,j); 
end; 

end; 

do  j  =  column  to  fenumcols; 

if  denom  =  0  then  ps(l,j)  =  0; 
else  ps(l,j)  =  np(l,j) /denom; 

end; 

do  j  =  column  to  feniimcols; 

do  i  =  min(&numrows,column+l)  to  min(&numrows , j+1) ; 
if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i, j)/denom; 
end; 

end; 

end; 

else  if  type  =  20  then  do; 

do  j  =  col\imnl  to  columnr; 
denom  =  denom  +  np(l,j); 

end; 

do  j  =  columnl  to  columnr; 

do  i  =  min(&numrows,columnl+l)  to  minC&numrows, j+1) ; 

denom  =  denom  +  np(i,j); 
end; 

end; 

do  j  =  col\imnl  to  columnr; 

if  denom  =  0  then  ps(l,j)  =  0; 
else  ps(l,j)  =  np(l, j)/denom; 
end; 

do  j  =  coliimnl  to  columnr; 

do  i  =  minC&numrows, columnl+1)  to  min(&numrows, j+1) ; 
if  denom  =  0  then  ps(i,j)  =  0; 
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end; 


else  ps(i,j)  =  np(i, j)/denom; 
end; 

end; 


else  if  type  =  21  then  do; 

if  fedwod  then  denom  =  np (1, column) ; 
do  i  =  row  to  min(&numrows ,column+  fedwod) ; 
denom  =  denom  +  np(i, column) ; 

end; 

if  fedwod  then  do; 

if  denom  =  0  then  ps(l, column)  =  0; 
else  ps(l, column)  =  np(l, column) /denom; 

end; 

do  i  =  row  to  min(&numrows, column+  fedwod) ; 
if  denom  =  0  then  ps(i, column)  =  0; 
else 

ps(i, column)  =  np(i, column) /denom; 

end; 

end; 


else  if  type  =  22  then  do; 
if  fedwod  then  do; 

do  j  =  1  to  column; 

denom  =  denom  +  np(l,j); 

end; 

end; 

do  i  =  row  to  min(fenumrows,column+  fedwod); 
do  j  =  i-  fedwod  to  column; 

denom  =  denom  +  np(i,j); 


end; 

end; 

if  fedwod  then  do; 

do  j  =  1  to  column; 

if  denom  =  0  then  ps(l,j)  =  0; 
else  ps(l,j)  =  np(l,j) /denom; 
end; 
end; 

do  i  =  row  to  min(fenumrows , column+  fedwod); 
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do  j  =  i-  fedwod  to  column; 
if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i, j)/denom; 
end; 

end; 

end; 

else  if  type  =  23  then  do; 
if  fedwod  then  do; 

do  j  =  column  to  feniuncols; 

denom  =  denom  +  np ( 1 , j ) ; 

end; 

end; 

do  j  =  column  to  &numcols; 

do  i  =  min(&numrows,column+&dwod)  to  minC&numrows , j+&dwod) ; 

denom  =  denom  +  np(i,j); 
end; 

end; 

do  i  =  row  to  min(&numrows-l, column  +&dwod  -1); 
do  j=  column  to  fenumcols; 

denom  =  denom  +  np(i,j); 
end; 
end; 

if  fedwod  then  do; 
do  j  =  column  to  fenumcols; 

if  denom  =  0  then  ps(l,j)  =  0; 
else  ps(l,j)  =  np(l, j)/denom; 

end; 

end; 

do  j  =  column  to  fenumcols; 

do  i  =  min(&numrows,column+&dwod)  to  min(&numrows , j+&dwod) ; 
if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i, j)/denom; 
end; 

end; 

do  i  =  row  to  min (&numrows-l, column  +&dwod  -1); 
do  j=  column  to  fenumcols; 

if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i, j)/denom; 
end; 
end; 
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end; 

else  if  type  =  24  then  do; 

if  fedwod  then  do; 

do  j  =  coluuml  to  colmnnr; 
denom  =  denom  +  np(l,j); 

end; 

end; 

do  i  =  row  to  min(&numrows-l  ,col\iinnl+  &dwod-l)  ; 
do  j  =  columnl  to  columnr; 

denom  =  denom  +  np(i,j); 
end; 

end; 

do  i  =  min(&numrows,columnl+  fedwod)  to  min(&numrows,columnr+  fedwod) ; 
do  j  =  max(i-  fedwod, columnl)  to  columnr; 

denom  =  denom  +  np(i,j); 
end; 

end; 

if  fedwod  then  do; 
do  j  =  columnl  to  colvimnr; 

if  denom  =  0  then  ps(l,j)  =  0; 
else  ps(l,j)  =  np(l, j)/denom; 
end; 
end; 

do  i  =  row  to  min(&numrows-l,col\imnr+  &dwod-l)  ; 
do  j  =  m2Lx(i-  &dwod, columnl)  to  columnr; 
if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i,j) /denom; 
end; 

end; 

do  i  =  min(&numrows,columnl+  fedwod)  to  min(&numrows , columnr+  fedwod) ; 
do  j  =  max(i-  &dwod, columnl)  to  columnr; 
if  denom  =  0  then  ps(i,j)  =  0; 
else  ps(i,j)  =  np(i,j) /denom; 
end; 

end; 


end; 

run; 


data  pseud; 
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set  feemout  end  =  last  nobs=datan; 
array  pseu{&numrows,&numcols}; 
array  ps{&numrows , fenumcols} ; 
do  i=  1  to  ftnumrows; 
do  j  =  1  to  fenumcols; 
pseud,  j)  +  ps(i,  j)  ; 

end; 

end; 

if  last  then  do; 
n  =  0; 

do  i  =  1  to  fenumrows; 
do  j  =  1  to  &numcols; 
n  =  n  +  pseu(i,j) ; 
end; 
end; 

if  (round (n)  -  round (datan))  then  do; 

put  ’Pseudo  data  does  not  add  up’; 
put  ’n  is  ’  n  ’  while  datan  is  ’  datan; 
end; 

end; 

call  symput ( ’ emtot ’ , left (trim (n) ) ) ; 
if  last ; 

keep  pseul  -  pseu&total; 
run; 

/:(c***xhis  data  set  saves  last  set  of  probs.****/ 
data  oldprobs; 
set  nextprob; 

array  newp{&numrows , fenumcols} ; 
array  oldp{&n\imrows ,  fenumcols} ; 
do  i  =  1  to  fenumrows; 
do  j  =  1  to  fenumcols ; 

oldp(i,j)  =  newp(i,j); 

end; 

end; 

keep  oldpl  -  oldp&total; 
run; 

/^^^^This  data  set  generates  the  next  set  of  probabilities.****/ 
data  nextprob; 
set  pseud; 
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array  pseu{&;numrows ,  fenumcols} ; 
array  newp{&numrows,&numcols}-; 
do  i  =  1  to  fentimrows; 
do  j  =  1  to  fenumcols; 

newp(i,j)  =  pseu(i,j)/&enitot; 

end; 

end; 

keep  newpl  -  newp&total; 
run; 

/***These  probs  should  then  be  fed  into  the  first  data  step. 

♦***But  first  we  need  to  compare  these  probs  to  the  last 
!ic***probs  to  check  for  convergence.***/ 

/*=i=**testing  for  convergence. *******/ 
data  _null_; 

merge  oldprobs  nextprob; 
array  newpC&numrows ,&numcols) ; 
array  oldp (fenumrows , fenumcols) ; 
diffsq  =  0; 
do  i  =  1  to  ftnumrows; 
do  j  =  1  to  fenumcols; 

diffsq  =diffsq  +  (newp(i,j)  -  oldp(i,j))*  (newp(i,j)  -  oldp(i,j)) 

end; 

end; 

call  symput ( Miff sq’ .left (trim(diff sq) ) ) ; 
r\m; 

data  _null_; 
a  =  fediffsq; 
b  =  fetol; 

/*put  M  is  ’  a; 
put  ’b  is  ’  b;*/ 
if  a  <  b  then  do; 
ind  =  1 ; 

call  symputC’ind’ , left (trim (ind)) ) ; 
end; 
stop ; 
run; 

/*yoput  ind  is  feind;  */ 

‘/.let  i  =  ’/.eval(fei  +1); 

‘/.let  numits  =  ‘/.eval(&i-l)  ; 
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'/.if  &i  >=  '/.eval(&its)  '/.then  '/.do; 

'/.put  We  have  reached  &its  iterations  with  sum  squared  differences 
equal  to  fediffsq; 

'/.end; 

'/.end; 

/******For  simulation  put  fenumits  into  data  set  to  get 
******convergence  rate.***/ 

'/.put  The  final  numits  is  feniimits; 

'/.put  The  final  diffsq  is  fediffsq; 
data  feprobout; 
set  nextprob; 
run; 

'/.mend ; 

/*  emCdataf ile  ,probf ile , emout  .probout  .numrows  ,numcols .total ,tol . its) */ 
/*'/.em(outmm.newprobs .pseudo. finalprb. &numrows .fenumcols.&total. . 0001 . 10)  ;*/ 
/*test  to  seee  if  total  pseudo  data  adds  to  total  in  mortmorb*/ 

/*data  _null_; 
set  pseud; 

array  pseu{&numrows,&numcols}; 
sum  =  0; 

do  i=  1  to  feniimrows; 
do  j  =  1  to  fenumcols; 

sum  =  sum  +  pseu(i,j); 
end; 
end; 

call  symput ( ’pseutot ’ .left (trim(sum)) ) ; 
run; 

'/.put  pseudo  count  total  seems  to  be  fepseutot;  */ 

/*Testing  distribution  sums  by  type.*/ 

/*proc  sort  data=&emout; 

by  type; 

run; 

data  test; 
set  feemout ; 
by  type; 

array  ps{&numrows . ftnumcols} ; 
sum=0; 

do  i  =  1  to  fenumrows; 
do  j  =  1  to  fenumcols; 
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svun  =  sum  +  ps(i,j); 
end; 
end; 

keep  sum  type; 
rim; 

proc  univariate  data  =  test; 
var  sum; 
by  type; 
rim;*/ 

'/.macro  est cov  (matrix ,  estcov)  ; 

***This  macro  computes  covariance  matrix  for  a  vector  of  *** 
***multinomial  cell  probabilities.  *** 

***Matrix  is  a  data  set  containing  the  k  rows  and  columns  *** 
***of  an  information  matrix  in  variables  named  coll  through*** 
***colk.  It  also  contains  variables  named  p,  row  and  *** 
♦**column.  The  variable  named  p  contains  *** 

***the  first  k  of  k+1  probabilities.  The  vairiables  row  and*** 
♦♦♦column  conain  the  row  and  column  number  of  the  original  ♦♦* 
♦♦♦matrix  of  rectangles  for  the  probability  in  p.  Letting  ♦♦* 
♦♦♦im  and  p  be  the  matrix  and  probability  vector,  the  *** 

♦♦♦output  will  be  in  the  data  set  named  in  estcov.  The  *** 

*^^data  set, estcov,  will  have  k  observations  with  variables*** 
*^^covl  through  covk,row,  col  and  p.  Covl  through  covk  will*** 
***contain  the  inverse  of  coll  though  colk.  Row,  col  and  p  *** 

***will  be  the  same  as  in  matrix.  Covl  through  covk  will  *** 

*^^covariance  matrix  for  the  vector  of  multinomial  *** 

*^^be  the  estimates,  p.  *** 

****:i)t*:)c:)t*.|c*:(c***:(c:)t!t:.|c*****:t:*3|c******.|t**j|c****!(e*jtt**:+^^:(t:(c.)c*!|c**:t:*j(:**  / 

data  _null_; 

set  fematrix  nobs=n; 

inf  dim  =  n; 

call  symputC’infdim' , compress (inf dim)) ; 

stop; 

run; 

data  _inf_; 

set  fematrix; 

keep  coll-col&inf dim; 

run; 


data  _rc_; 
set  toatrix; 
keep  row  column  p; 
run; 

proc  iml; 

/♦♦♦♦converting  matrix  to  matrix  and  vector  form^^^^/ 

•  use  _inf_; 

read  all  var  _num_  into  inf; 

cov  =  inv(inf); 
create  cov  from  cov; 

append  from  cov; 
quit ; 

data  feestcov; 
merge  cov  _rc_; 
array  col{&inf dim} ; 
array  cov{&inf dim} ; 
do  i  =  1  to  feinfdim; 

cov(i)  =  col(i) ; 
end; 

drop  coll-col&inf dim  i; 
rim; 

'/.mend ; 

’/.macro  pstdtab(covp,rowfmt,colfmt,colen,fmtcell, title)  ; 

♦♦♦This  macro  tables  the  joint  probabilities  and  their^^^ 
♦♦♦standard  deviations  for  a  two  dimensional  matrix  of^^^ 
♦♦*of  rectangles.  ♦♦♦ 

♦♦=i!covp  is  a  SAS  data  set  containing  k  observations.  ♦♦♦ 
♦♦♦The  k  observations  contain  variables  covl-covk,  row^^^ 
♦♦♦col  and  p.  Covl-covk  is  the  covariance  matrix  for  ♦♦♦ 
♦♦♦the  multinomial  probability  estimates  in  p.  Row  ♦♦♦ 
♦♦♦and  col  contain  the  row  and  coliimn  for  the  values  ♦♦♦ 
♦♦♦in  p  in  the  original  matrix  of  rectangles.  There  ♦♦♦ 
♦♦♦are  k+1  probabilities.  The  missing  probability  is  ♦♦♦ 


♦♦♦the  probability  for  the  lower  right  rectangle  of  ♦♦♦ 
♦♦♦matrix.  The  row  and  coliimn  numbers  for  this  ♦♦♦ 
♦♦^probability  are  max  of  the  row  values  and  max  of  ♦♦♦ 
♦♦♦the  column  values,  respectively.  ♦♦♦ 


♦♦♦The  last  probability  is  computed  as  1  -  the  sum  of  ♦♦♦ 
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♦**the  values  in  p.  Its  variance  is  the  sum  of  the  *** 
***entries  in  the  covariance  matrix,  covl-covk.  *** 
***Title  will  be  the  title  of  the  table.  *** 
***Rowfmt  cind  colfmt  are  the  formats  that  for  the  rows*** 


***and  column.  For  example  row  =1  may  be  death  w/o  *** 
***disease  eind  the  rest  of  the  rows  aind  columns  may  *** 
***partitions  of  (0,inf inity) .  *** 
***colen  is  the  width  desired  for  the  table  columns.  *** 
***fmtcell  is  the  format  in  w.d  form  (e.g.  7.3)  for  *** 
***the  values  to  be  printed  in  the  table  cells.  *** 
***Note  that  this  macro  includes  *** 
***c : \windows\desktop\mortmorb\tabmac . sas .  *** 


‘/.include  '  c :  \windows\desktop\mortmorb\tabmac .  sas  ’ ; 

data  _null_; 

set  &covp  nobs  =  n; 

covdim  =  n; 

call  symput ( ’ covdim ’ , compress (covdim)  )  ; 

stop; 

run; 

/*****The  following  code  computes  the  final  probability  *** 
the  standard  deviations.  *** 

data  stdprob_; 

set  &covp  end=last; 

array  cov{&covdim}; 

lastvar  +  s\im(of  covl-cov&covdim) ; 

totprob  +  p; 

i  =  _n_; 

stdev  =  sqrt (cov(i) ) ; 
prob  =  p; 
if  last  then  do; 

Iprob  =  1.0  -  totprob; 

Istdev  =  sqrt (lastvar) ; 
call  symput ( ’ Iprob ’ , compress (Iprob) )  ; 
call  symput ( ’ Istdev ’ , compress (Istdev) ) ; 
end; 

keep  prob  stdev; 
run; 
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data  lastones; 

prob  =  felprob; 

stdev  =  felstdev; 

output ; 

stop; 

run; 

data  stdprob_; 

set  stdprob_  lastones; 

rim; 

/****Adding  the  row  and  column  for  the  left  out  cell*** 

*****to  the  row  column  list.**/ 

data  _rc_  ; 

set  &covp; 

keep  row  column; 

run; 

data  _lastrc_; 
set  _rc_  end  =  last; 
retain  maxj ; 

if  _n_  =  1  then  maxj  =  column; 
else  maxj  =  max (max j , column) ; 
if  last  then  do; 

if  column  <  maxj  then  column  =  column  +  1 ; 
else  row  =  row  +  1; 

end; 

if  last; 

keep  row  column; 
run; 

data  _rc_; 

set  _rc_  _lastrc_; 

run; 

Putting  probs ,  stdev,  row  and  column  into  data  set*** 
*:t::ic=x*;**:t!=t=suitable  f Or  use  in  the  tableit  macro.  **/ 

data  _final_; 
merge  stdprob_  _rc_; 
run; 

“/ot  able  it  ( _f  inal_ ,  row ,  column ,  ferowffflt ,  fecolfmt ,  fecolen ,  &fmt  cell , 

prob  stdev, time-to-onset ,time-to-death,&title) ; 

7omend ; 

/****Example* ******/ 


70 


yoinclude  ’  c :  \windows\desktop\mortmorb\matrx .  sas  ’ ; 

‘/omatrxCmm.rinf orm.rmats)  ; 

/*%matrx(niin. cinf orm.cmats)  ;*/ 

/*  estcov(matrix,estcov) 

pstdtab (covp , rowf mt , coif mt , colen , f mt cell , title) */ 

/*'/.estcov(rmats,rcov) ; 
options  notes; 

'/opstdtab(rcov,rrow,rcoluinn,10,8.6,Exposed  Joint  Probabilities);  */ 
’/.macro  f illcov(covp, filled)  ; 

***This  macro  taJces  the  covariance  matrix  *** 

**=i<for  a  vector  of  K  probabilities  of  a  K+1  *** 

♦♦^multinomial  and  fills  in  the  K+1  row  and  ♦♦♦ 

♦♦♦column  for  the  K+1  probability.  It  also  ♦♦♦ 

♦♦♦calculates  the  K+1  probability.  ♦♦♦ 

♦♦♦covp  in  a  SAS  input  data  set  containing  ♦♦♦ 

♦♦♦the  covariance  matrix  in  K  observatons  ♦♦♦ 

♦♦♦and  variables  covl  through  covK  aind  the  K^** 


♦♦♦probabilities  in  the  variable  p.  *** 
♦♦♦Variables  Row  and  colomn  will  contain  ♦♦♦ 
♦♦♦the  coordinates  for  the  rectangle  for  *** 
♦♦♦which  the  p  is  the  probability.  ♦♦* 


♦♦♦Filled  is  a  SAS  output  data  set  with  K+1  ♦♦* 
♦♦♦observations  and  variable  covl-cov(K+l)  ♦♦♦ 
♦♦♦containing  the  covariance  matrix  for  the  ♦♦* 
♦♦♦vector  of  K+1  probabilities  in  variable  p^^+ 
♦♦♦Row  and  colomn  will  be  updated  to  include*^^ 
♦♦♦the  coordinates  of  the  rectangle  for  the  ♦♦♦ 
♦♦♦last  probability.  ♦♦♦ 

/♦computing  last  row  and  column  numbers^/ 

data  _lastrc_; 

set  fecovp  end  =  last; 

retain  maxj ; 

if  _n_  =  1  then  maxj  =  column; 
else  maxj  =  max (maxj .column) ; 
if  last  then  do; 

if  column  <  maxj  then  column  =  column  +  1 ; 
else  row  =  row  +  1; 
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end; 

if  last ; 

keep  row  column; 
run; 

data  _null_; 
set  fecovp  nobs=n; 

call  symput ( ’ covdim ’ , compress (n)  )  ; 

stop; 

run; 

data  lastrow; 
set  &covp  end=last; 
array  .sum-C&covdim}; 
array  cov-[&covdim}; 
do  i  =  1  to  fecovdim; 

_sum(i)  +  cov(i) ;  /*summing  column  i  of  cov  matrix.*/ 

end; 

sump  +  p;  /*stunming  the  probabilities*/ 

if  last  then  do; 

do  i=l  to  fecovdim; 

cov(i)  =  -_sum(i) ;  /*computing  K+1  observation  for  covl-covK*/ 
end; 

p  =  1.0  -  sump;  /*computing  K+1  probability.  */ 

end; 

if  last ; 

keep  covl-cov&covdim  p; 
run; 

y,let  filldim  =  7.eval(&covdim  +  1); 

data  lastcol; 
set  lastrow; 
array  cov{&covdim} ; 

retain  covl-cov&covdim;  /*puts  sum  of  columns  of  original  cov  matrix*/ 
do  i  =  1  to  fecovdim;  /*into  a  column  under  the  name  cov&f illdim. */ 
cov&filldim  =  cov(i); 
output ; 

end; 

drop  i  p  covl-cov&covdim; 
run; 
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data  lastrow; 
set  lastrow; 
array  cov{&f illdim}; 
cov(&f illdim)  =  0; 
do  i  =  1  to  fecovdim; 

cov(&f illdim)  =  cov(&f illdim)  -  cov(i) ; 

/♦summing  all  original  cov  entries.  This  is  the  variance  of  the  estimate  for 

the  K+1  probability.*/ 

end; 

drop  i; 

run; 


data  lastrow; 

merge  lastrow  _lastrc_; 

run; 


data  fefilled; 

merge  &covp  lastcol; 

run; 

data  Milled; 

set  Milled  lastrow; 

run; 

'/.mend ; 

/♦example*/ 

/♦data  test; 

input  covl-cov7  p  row  column; 
cards; 

1234567.111 

2345671.112 

3456712.113 

4567123.121 

5671234.122 
6712345  .123 
7123456  .132 


run; 

°/ofillcov(test  jOutest) ; 

proc  print  data  =  outest  noobs; 

run; */ 

/♦♦♦Fixboth. sas*****/ 
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/=)=*=i<This  code  runs  Tnni!3  on  Exposed  and  control  data.  Feeding  output*** 
****formats  from  Exposed  run  into  the  run  for  comparisons.**/ 
options  ls=78  nonotes; 

libname  mm  ’c:\windows\desktop\mortmorb’; 

’/oinclude  '  c :  \windows\desktop\mortmorb\mm2 .  sas  ’ ; 

yomm2 (mm . exposed , mm . rprobs , mm . rpseu , dis , deth .times , . 0001 , 100 , 

c : \windows\deskt opXmortmorb , mm . rinf  orm , r) ; 
%mm2 (mm. control .mm. cprobs, mm. cpseu,rdis,rdeth, times, 

. 0001 , 100 , c : \windows\deskt opXmortmorb , mm . cinf orm  , c) ; 
/*7.mortmorb(mm. control.mm. cprobs.mm. cpseu,20,  .0001,100, 

cAwindowsXdeskt  opXmortmorb,  mm.  cinf  orm,  c)  ;  */ 
/****intprob2 . sas  3/30/98**/ 

"/.macro  initprob(inf  ile ,  outf  ile)  ; 

***This  macro  takes  an  infile  which  is  the  output  from  a*** 

***proc  freq  rim  and  produces  an  outfile  which  is  a  set  *** 


***of  probabilities  for  each  cell  of  the  table.  *** 
***The  table  created  is  such  that  removing  the  first  row*** 
***leaves  a  submatrix  which  haszero  counts  for  the  *** 
***triangle  below  the  diagonal.  *** 
***Cells  outside  of  the  lower  triangle  that  have  zero  *** 
***coimts  are  given  the  minimum  adjusted  count  of  1/n  or*** 
***sum/n  where  n  is  the  total  of  the  table  and  sum  *** 


***is  the  sum  of  the  actual  counts  of  all  cells  adjacent*** 

***to  the  empty  cell.  The  outfile  of  probabilities  will*** 

***be  the  adjusted  count  for  a  cell  divided  by  the  sum  *** 

***of  the  adjusted  coimts  for  all  the  cells.  *** 

***The  order  of  the  cells  is  that  of  the  order  of  a  two  *** 
***dimensional  array  in  SAS.  That  is,  cells  are  counted*** 

***across  columns  and  then  down  to  the  next  row.  *** 

***The  number  of  columns,  rows  and  the  total  number  of  *** 

***cells  are  put  into  global  macro  variables  named  *** 

***numcols,  numrows  and  total,  respectively.  *** 

"/.global  numcols  numrows  total  dwod; 

/**dwod  indicates  whether  or  not  the  first  row  of  the  matrix  *** 

***is  death  without  disease.  If  dwod  is  1  then  there  is  such*** 

***a  row  if  dwod  is  0  then  there  is  not .  **/ 

/**Next  data  step  put  the  appropriate  values  into  the  global  macros.***/ 
data  _null_; 
set  feinfile  nobs  =  n; 
by  obst ; 
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if  _n_  =  1  then  do; 

numcols  =  1 ; 
if  obst  =  .  then  do; 
dwod  =  1 ; 

end; 

else  do; 

dwod  =  0; 

end; 

call  symput (’ dwod compress (dwod)) ;  /*dwod  is  1  if  there  is  a  death 

♦♦without  disease  row  and  zero 
♦♦otherwise . ♦/ 

end; 

else  numcols  =  numcols  +  1; 
retain  niomcols; 
if  last. obst  then  do; 

numrows  =  n/numcols; 

call  symputC’ numcols’ , left (trim (numcols) ) )  ; 
call  symput(’numrows’ ,left(trim(numrows)))  ; 
call  symput (’ total ’ ,  left (trim (n) )) ; 
stop; 

end; 

run; 

/♦♦*The  following  data  set  puts  the  cell  index  (i,j)  in  the^^^ 

♦♦♦♦data.  Each  observation  has  then  a  cell  count  eind  the  ♦♦♦ 

♦♦♦♦cell  index.  ♦♦/ 

data  cells; 

set  feinfile  end  =  last; 
by  obst; 

i  =  _n_/&numcols  +  1; 
i  =  int(i) ; 

if  last. obst  then  i  =  i-1; 
if  first. obst  then  j  =  1; 
else  j  =  j+1; 
retain  j ; 
run; 

/♦♦♦♦We  will  not  remove  impossible  cells.  We  will  just  keep  them 
♦♦♦at  coxmt  =  0  and  prob  =  0.  It  makes  the  indexing  easier, 

♦♦♦although  more  inefficient.  ♦♦/ 

/♦♦The  next  data  step  takes  the  counts  for  each  cell  and  computes^^ 
♦♦♦the  sample  proportion  in  each  cell,  putting  the  sample  ♦♦ 

♦♦♦proportion  into  an  array  named  probs  with  the  appropriate  cell^^ 
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***index  */ 

data  probs; 
set  cells  end  =  last; 
by  obst; 

array  cnts{&numrows , fenumcols} ; 
array  probs{&nuinrows,&numcols3-; 

retain  cntsl-cnts&total 

probs l-probs&total ; 
cnts(i,j)  =  count; 
probs (i,j)  =  percent/100; 

sum  +  count ; 
if  last  then  do; 

call  symputC’n’ jleft (trim(sum))) ; 
end; 

if  last; 

drop  count  percent  obst  obsd; 
run; 

***Keep  in  mind  that  the  two  dimensional  arrays  are  numbered  *** 
***in  the  following  way  (i,j)  is  (i-l)*numcols+int (j/numcols  +  1)*** 
***for  jmod(numcols)  ne  0  and  i*nvimcols  if  jmod(numcols)  is  0.  *** 

***Now  we  must  adjust  the  probs  array  in  data  set  probs  so  that  *** 


***0  count  cells  get  small  probabilities  in  accordance  to  the  *** 
***counts  in  adjacent  cells.  *** 
***Keep  in  mind  that  we  will  keep  the  count  at  0  for  any  cell,  *** 
***(i,j)  where  i-&dwod  >  j,  because  these  axe  cells  in  which  *** 
***diseasetime  is  greater  than  death  time. 


data  feoutfile; 
set  probs; 

array  cnts{&numrows,&nvimcols}; 
array  probs{&numrows,&numcols}-; 
array  newp{&numirows,&numcols}; 
if  cnts(l,l)  =  0  then 

newp(l,l)  =  cnts(l,2)  +  cnts(2,l)  +  cnts(2,2); 
do  jj  =  2  to  7«eval(&numcols  -1); 
if  cnts(l,jj)  =  0  then  do; 
newp(l,jj)  =  0; 
do  ii  =  1  to  2; 
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do  jjj  =  jj-1  to  jj+1; 

newpCl.jj)  =  newpd.jj)  +  cnts(ii, j j j)  ; 
end; 

end; 

end; 

end; 

if  cntsd.&numcols)  =  0  then  do; 

newpCl ,&numcols)  =  0; 
do  jj  =  7oeval  (fenumcols  -1)  to  fenumcols; 

newpdj&niimcols)  =  newpCl ,  fenumcols)  +  cnts(2,jj); 

end; 

end;  /****row  1  is  now  taien  care  of.**/ 


do  ii  =  2  to  y,eval(fenumrows  -  1); 

do  jj  =  2  to  7, eval (fenumcols  -  1); 

if  cnts(ii,jj)  =  0  then  do; 
newp(ii,jj)  =  0; 

do  iii  =  ii-1  to  ii+1; 
do  jjj=  jj-1  to  jj+1; 

newpdi.jj)  =  newp(ii,jj)  cntsdii,  j  j  j) ; 
end; 
end; 

end; 


end; 

end; 


/*This  taJkes  care  of  all  cells  in  interior  of  table.**/ 


do  ii  =  2  to  7oeval(fen\imrows  -1); 

if  cnts(ii,l)  =  0  then  do; 
newp(ii,l)  =  0; 
do  iii=ii-l  to  ii+1; 
do  jjj  =  1  to  2; 

newpdijl)  =  newpdi.l)  +  cntsdii,  jjj) : 

end; 

end; 

end; 

end;  /*This  takes  care  of  the  first  column  up  to  second 

last  row.*/ 
if  cnts(fenumrows, 1)  =  0  then  do; 

newp(fenumrows,l)  =  0; 

do  iii  =  7oeval (fenumrows  -1  )  to  fenumrows; 
do  jjj  =  1  to  2; 
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newpC&numrows,!)  =  newpC&numrows , 1)  +  cntsCiii, j jj) ; 
end; 
end; 


end;/*This  takes  care  of  column  1  of  last  row.*/ 


do  ii  =  2  to  '/,eval  C&numrows  -  1); 

if  cntsCii ,&numcols)  =  0  then  do; 
newp(ii,&numcols)  =  0; 
do  iii  =  ii-1  to  ii+1; 

do  jjj  =  %eval(&numcols  -  1)  to  fenumcols; 
newpCii ,&numcols)  =  newpCii ,&numcols)  +  cntsCiii, jjj) ; 
end; 
end; 

end; 

end;  /**This  taikes  care  of  last  column  up  to  last  row*/ 

if  cnts(&numrows,&numcols)  =  0  then  do; 

newp(&n\imrows,&numcols)  =0; 

do  iii  =  Voeval (&numrows  -  1)  to  fenumrows; 
do  jjj  =  7,eval(&numcols  -  1)  to  fenumcols; 
newp (&numrows , fenumcols)  = 

newpC&numrows, &numcols)  +  cntsCiii, jjj) ; 

end; 

end; 

end; 

do  jj=2  to  y.evalC&n-umcols  -  1); 

if  cntsC&numrows, j j)  =  0  then  do; 
newpC&numrows, jj)  =  0; 
do  iii=  '/,eval C&nximrows  -1)  to  &numrows; 
do  jjj  =  jj-1  to  jj+1; 

newpC&numrows, jj)  =  newpC&numrows, jj)  +  cntsCiii , jjj) ; 
end; 
end; 

end; 

end;  /**This  takes  care  of  last  row  cols  2  to  second  last  col.*/ 

/♦**Now  all  0  cells  should  be  done.**/ 


do  jj  =  1  to  &numcols; 

if  cntsCljjj)  ne  0  then  newpCl,jj)  =  cntsCl,jj): 
if  newpCljjj)  =  0  then  newpCl,jj)  =  1.0/sum; 


end; 
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do  ii  =  2  to  fenumrows; 

do  jj  =  ii-1  to  fenumcols; 

if  cnts(ii,jj)  ne  0  then  newp(ii,jj)  =  cnts(ii,jj); 
if  newp(ii,jj)  =  0  then  newp(ii,jj)  =  1.0/sum; 

end; 

end;  /**Now  new  p  contains  either  cell  coimt  if  it  is  nonzero, 
and  a  small  num  that  is  at  least  1/n  and 
at  most  1**/ 

/*!ic*The  following  sets  the  probs  of  impossible  cells  back  to  0.*** 
**>(=*Notice  that  if  dwob  is  1  then  there  is  a  first  row  with  *** 
****disease  missing  and  thus  the  impossible  cells  start  below  *** 
****xo-v  3.  If  dwob  is  0  then  the  entire  matrix  has  lower  *** 

^♦★♦diagonal  of  impossible  cells.  **/ 


do  ii  =  2+&dwod  to  &numrows; 
do  jj  =  1  to  ii-l-&dwod; 

newp(ii,jj)  =  0; 
end; 

end; 

newsum=0 ; 

do  ii=  1  to  fenumrows; 

do  jj=  1  to  feniimcols; 

newsum  =  newsum  +  newp(ii,jj); 

end; 

end; 

do  ii=  1  to  fenunirows; 

do  jj=l  to  fenumcols; 

newp(ii,jj)  =  newpCii, j j)/newsum; 

end; 

end;  /**Now  newp  contains  adjusted  probabilities**/ 

keep  newpl-newp&total ; 

run; 

"/.mend ; 

/*y.initprob(newprobs)  ;*/ 

“/omacro  lethalty(covp,outf ile, coif mt, title, collab)  ; 
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***This  macro  computes  the  of  a  disease  for  a  discrete  set  *** 
***of  time  intervals,  using  the  joint  distribution  of  *** 
***time-to-onset  of  disease  and  time-to-death.  It  assumes  *** 
***that  time-to-onset  and  time-to-death  have  been  *** 
♦♦^partitioned  into  intervals  and  a  matrix  of  rectangles  ♦♦♦ 
♦♦♦have  been  formed  by  these  intervals.  Time-to-onset  is  ♦♦♦ 
♦♦♦represented  by  the  rows  of  the  matrix,  the  first  of  ♦♦♦ 
♦♦♦is  a  row  for  death  known  to  have  occurred  before  without^^^ 
♦♦♦disease.  Time-to-death  intervals  form  the  columns  of  ♦♦♦ 
♦♦♦laatrix  of  rectangles .  The  row  and  column  intervals  must^^^ 
♦♦♦be  the  same  although  there  may  be  more  intervals  for  ♦♦♦ 
♦♦♦time  to  death  than  for  time-to-onset.  So  the  last  ♦♦♦ 
♦♦♦interval  for  time-to-onset  will  have  a  left  endpoint  ♦♦♦ 
♦♦♦which  coincides  with  one  of  the  left  endpoints  for  ♦♦♦ 
♦♦♦time-to-death,  but  may  be  a  collapsing  of  a  number  of  ♦♦♦ 
♦♦♦last  intervals  for  time-to-death.  ♦♦♦ 
♦♦♦The  name  of  the  input  SAS  data  set  in  put  into  argument  ♦♦♦ 
♦♦♦covp.  ♦♦♦ 
♦♦♦Covp  has  K  observations  where  K  is  the  number  of  ♦♦♦ 
♦♦♦rectangles  in  the  matrix  that  have  nonzero  probability.  *♦* 
♦♦♦Covp  has  variables  covl-covK,  which  are  the  columns  for  ♦♦* 
♦♦♦the  covariance  matrix  of  the  K  values  in  the  variable,  ♦♦♦ 
♦♦♦p.  The  variable, p  contains  the  K  estimates  of. the  joint^^^ 
♦♦♦probabilities  for  the  rectangles.  The  ordering  of  the  ♦♦* 
♦♦♦rectangles  and  thus  the  joint  probabilities  is  the  same  ♦♦♦ 
♦♦♦as  the  conventional  double  cirray  ordering  in  SAS.  That  ♦♦♦ 
♦♦♦is  the  coimt  start  at  the  first  row  first  column  and  ♦♦♦ 
♦♦♦procedes  right  through  the  coliimns  and  then  bac  down  to  ♦♦♦ 
♦♦♦the  next  row.  The  submatrix  left  after  eliminating  the  ♦♦♦ 
♦♦♦first  row  has  all  zero  joint  probabilities  in  the  lower  ♦♦♦ 
♦♦♦triangle,  since  these  rectangles  represent  knowing  that  ♦♦♦ 
♦♦♦disease  occured  after  death  and  knowing  exactly  what  ♦♦♦ 
♦♦♦time  interval  the  disease  occured.  These  rectangle  are  ♦♦♦ 
♦♦♦not  included  in  covp.  Covp  also  contains  variables  row  ♦♦♦ 
♦♦♦and  column  which  indicate  the  coordinates  of  the  joint  ♦♦♦ 
♦♦♦probability  in  the  matrix  of  rectangles.  ♦♦* 
♦♦♦In  matrix/vector  form  the  K  observations  for  covl-covK  ♦♦* 
♦♦♦represent  the  covariance  matrix  for  the  K  observations  ♦♦♦ 
♦♦♦in  the  column  vector  p. 

♦♦♦Lethality  is  computed  for  each  time-to-death  interval.  ♦♦♦ 
♦♦♦It  is  the  sum  of  the  joint  probabilities  for  rows  2  to  ♦♦♦ 
♦♦♦the  maximum  time-to-onset  interval  less  than  or  equal  to^^^ 
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=i'=i'=i'the  death  interval  divided  by  the  joint  probability  of  *** 
***the  first  row  (death  w/o  disease)  in  that  death  interval*** 
***In  other  words,  the  lethality  for  interval  j  is  the  *** 
***suin  of  the  probabities  of  getting  the  disease  up  to  and  *** 
***including  interval  j  and  dying  in  interval  j  divided  by  *** 
***the  probability  of  dying  in  interval  j  without  disease.  *** 
***0utfile  will  be  a  SAS  data  set  that  contains  the  *** 

***the  lethalities  in  a  variable  L,  and  the  covaricince  of  *** 
***the  lethalities  in  columns  Lcovl-LcovJ,  where  J  is  the  *** 
***number  of  time-to-death  intervals (i . e .  the  number  of  *** 
***columns) .  Colfmt  is  the  format  for  assinging  the  column*** 
***intervals  to  the  column  numbers.  *** 

***Title  is  used  to  identify  the  data  set.  It  will  be  *** 
***the  third  line  in  the  output  print  out  title.  The  first*** 
***two  lines  of  the  title  are  'Column  Lethalities  and  *** 

***Covariance  Matix'  and  'for'  .  Collab  is  the  lable  to  be*** 
***given  to  the  column — e.g.  time-to-death.  *** 

***The  covariance  matrix  for  the  vector  of  lethalities  is  *** 
***computed  using  the  delta  method.  The  vector  of  *** 

***lethalities  is  a  function  of  the  p  vector  from  the  input*** 
***SAS  data  set  covp.  So  the  asymptotic  covariance  matrix  *** 
***for  the  vector  of  lethalities  is  a  product  of  the  *** 

***gradient  matrix  of  the  Lethality  vector  times  the  matrix*** 
***in  covl-covK  times  the  transpose  of  the  gradient  matrix.*** 

***The  following  data  steps  compute  the  numerators  of  the  *** 
***lethalities  and  stores  them  with  the  denominators  in  a  *** 
**=(tdata  set  _ntims_ .  *** 

data  _null_; 

set  fecovp  nobs  =  K  end  =  last; 
retain  maxrow  maxcol; 
if  _n_  =  1  then  do; 
maxrow  =  row; 
maxcol  =  column; 

end; 

else  do; 

maxrow  =  max (maxrow, row) ; 
maxcol  =  max (maxcol, column) ; 

end; 

if  last  then  do; 
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call  symputC’K’ ,compress(K)) ; 

call  symput  ( ’rownxim’ ,  compress  (maxrow) )  ; 

call  symputC’colnxim’ ,  compress  (maxcol) )  ; 

end; 

run; 


data  _in_; 

set  &covp  end=last; 

array  num{&colnum} ;  /**For  numerator  of  column  lethalities.*/ 
array  den{&colnuim} ;  /**For  denominator  of  column  lethalities.*/ 
retain  numl-num&colnum  denl-den&colnum; 
if  _n_  =  1  then  do; 

do  j  =  1  to  fccolnum; 
nTim(j)  =  0; 


end; 


end; 


if  row  =  1  then  den (column)  =  p; 
else  num(column)  =  num(coluinn)  +  p; 


if  last; 

keep  nvunl-num&colnxim  denl-den&colnum; 
run; 

***The  following  data  set  computes  the  gradient  matrix  for  the  *** 
***lethality  vector.  The  gradient  matrix  has  colnum  rows  and  *** 
***K  columns .  *** 

***Most  colvimns  for  a  row  are  zero.  The  jth.  row  has  a  nonzero*** 
***entrie  in  column  j,  which  is,  in  the  variables  of  _in_,  *** 

***-nuimj/(denj*denj)  .  The  other  nonzero  entries  are  1/denj  cUid*** 
**=i<appear  in  col\imns  corresponding  to  the  matrix  coordinates  *** 
***(min(2,rownum) , j)  to  (min(j+l,rownum) , j) .  *** 

***The  column  corresponding  to  coordinates  (i,j)  is  *** 

=  (i-l)*colnum  +  j  -(i-2)*(i-l)/2.  *** 

data  _prime_; 
set  _in_; 

array  num{&colnum}; 
array  den-C&colnum} ; 

retain  numl-num&colnum  denl-den&colnum; 
do  j  =  1  to  &colnum; 

sum  =  num(j) ; 
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pi  =  den(j) ; 
output ; 

end; 

drop  numl-num&colnum  denl-den&colnum; 
run; 

data  f prime; 
set  _prime_; 
array  f{&k}; 
do  j  =  1  to  &k; 
f(j)  =  0; 

end; 

f(_n_)  =  -siuii/(pl*pl) ; 

do  i  =  min(2,&rownum)  to  min(_n_  +  l,&rownum); 
s  =  (i-l)*&colnum  +  _n_  -  (i-2) *(i-l)/2; 
f(s)  =  1.0/pl; 

end; 

drop  i  j  s; 
rim; 

data  f prime; 
set  f prime; 
leth  =  sum/pl; 
drop  sum  pi ; 
run; 

/♦♦preparing  matrices  for  use  in 

data  _cov_; 
set  fecovp; 
keep  covl-cov&k; 
run; 

data  _temp_; 
set  f prime; 
keep  fl-f&k; 
rim; 

proc  iml; 

/♦♦♦putting  the  fprime  matrix  into  a  matrix  and  lethality  vector*** 
♦♦**into  a  vector.  **/ 

use  _temp_; 


read  all  var  _num_  into  _f prime.; 
use  _cov_; 

read  all  var  _num_  into  _covp_; 

covleth  =  _fprime_=i=_covp_*_fprime_‘ ; 
create  newcov  from  covleth; 
append  from  covleth; 
quit ; 

data  newcov; 
set  newcov; 
array  col{&colnum} ; 
ar-ray  lcov{&colnum} ; 
do  i  =  1  to  fecolnm; 

lcov(i)  =  col(i) ; 

end; 

keep  Icovl-lcov&colnum; 
run; 

data  _leth_; 
set  f prime; 
coliimn  =  _n_; 
keep  leth  column; 
riin; 

data  ftoutfile; 

lable  leth  =’ lethality'  column  =  fecollab; 

merge  _leth_  newcov; 

rxm; 

proc  print  data  =  feoutfile  label; 

titlel  'Column  Lethalities  and  Covariance  Matix' ; 

title2  'for'; 

titled  fetitle; 

format  column  fecolfmt. . ; 

var  column  leth  Icovl-lcov&colnum; 

run; 

%mend ; 

/* 

'/olethalty  (rf  illd ,  rout ,  rcolumn , Exposed ,  Time-to-Death)  ; 
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‘/olethalty  (cf  illd ,  cout ,  rcoliimn ,  Controls ,  Time-t o-Death) ;  */ 

/  .  sas****/ 

/♦***Th.is  code  computes  and  tables  the  lethalities  for  Exposed  *** 
♦**=ic*and  comparisons  by  time-to-death  interval  and  tests  for  equality*** 
*****of  the  two  vectors  of  lethality.  *** 

‘/oinclude  ’  c :  \windows\desktop\mortmorb\lth2quad .  sas  ’ ; 

’/oinclude  ’  c :  \windows\desktop\mortmorb\quaddif  f .  sas  ’ ; 

'/.include  ’  c :  \windows\desktop\mortmorb\lethalty .  sas  ’ ; 

'/.lethalty  (rf  illd ,  rout , rcolumn , Exposed ,  Time-to-Death)  ; 

'/.lethalty  (cf  illd ,  cout ,  rcolumn ,  Controls ,  Time-t  o-Death) ; 

'/,lth2quad  (rout ,  inr) ; 

‘/,lth2quad(cout ,  inc) ; 

'/.quaddiff  (inr, inc,rrow, rcolumn, lethality, Exposed, Controls) ; 
y.macro  lth2quad(lethin,lethout) ; 

***This  macro  takes  the  output  from  lethalty*** 


***and  puts  it  into  the  form  for  input  to  *** 
***quaddif f .  *** 
***Input  data  set  lethin  is  output  from  *** 
***lethalty  and  lethout  will  be  the  output  *** 
=i<**data  set  suitable  for  immediate  input  to  *** 
***diffquad.  *** 


data  _null_; 

set  ftlethin  nobs  =  n; 

call  symput (’n’ , compress (n)) ; 

stop; 

rxin; 

data  felethout; 
set  felethin; 
array  lcov{&n}; 
array  cov{&n}; 
do  i  =  1  to  &n; 

cov(i)  =  lcov(i); 

end; 

p  =  leth; 
row  =  0; 
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keep  p  row  column  covl-cov&n; 
run; 

%mend ; 

/^♦example**/ 

/* 

"/.include  ’  c :  \windows\desktop\mortmorb\lth2quad .  sas  ’ ; 

'/.include  ’  c :  \windows\desktop\mortmorb\quaddif  f  .  sas  ’ ; 

‘/.include  ’  c :  \windows\desktop\mortmorb\lethalty .  sas  ’ ; 

’/.lethalty  (rf  illd ,  rout ,  r column , Exposed ,  Time-to-Death)  ; 
'/.lethaltyCcfilld,  cout  .rcolumn, Controls  ,Time-to-Death)  ; 
'/.lth2quad(rout,inr)  ; 

'/.lth2quad(cout,inc)  ; 

‘/.quaddiff(inr,inc, rcolumn, lethality, Exposed, Controls)  ;  */ 
/*******+**'niis  code  tcikes  a  data  set  containing 

arrays  im(inf dim, infdim) ,  newp(infdim+l) ,  ni(infdim+l)  and  nj (inf dim+1) 
and  puts  them  into 

row/column  matrix  form  with  im(i,l)-im(i,infdim)  the  ith.  row  where 
cols  aire  named  coll-colinf dim.  newp(i)  is  the  value  of  a  variable 
name  p  in  the  ith  observation.  Row  =  ni(i)  cind  column  =  nj(i)  are  the 
row  and  column  rectangle  for  which  p  is  the  probability.  The  last  newp, 
ni  and  nj  values  are  not  included  since  the  last  p  value  is  1  minus  the 
sum  of  the  first  infdim  p  values,  ni (infdim  +  1)  =  ni (infdim)  and 
nj (infdim  +  1)  =  nj (inf dim)  +  1,  assuming  that  there  is  a  row  for  disease 
known  not  to  have  occurred  before  death. 

/**remember  that  there  is  one  more  probabilitiy  the  last  probability  is 
1  minus  the  sum  of  the  infdim  p’s.***/ 

‘/.macro  matrx(in,out)  ; 

data  _null_; 
set  &in; 

newpsize  =  infdim  +  1; 

call  symput( ’newpsize’ , compress (newpsize) ) ; 
call  symput ( ’ inf dim’ .compress (infdim)) ; 
stop; 
run; 

data  &out; 
set  &in; 

array  im{&infdim,&infdim} ; 
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array  col{&infdim}; 
array  newp{&newpsi2e}; 
array  ni{&newpsize};  /**new**/ 
array  nj{&newpsize};  /♦♦new**/ 
do  i  =  1  to  &infdim; 

do  j=  1  to  &infdim; 
col(j)  =  im(i, j) ; 

end; 

p  =  newp(i) ; 
row  =  ni(i);  /*new*/ 
column  =  nj (i) ;  /*new*/ 
output ; 
end; 

keep  coll-col&infdim  p  row  column;  /*new*/ 
run; 

%mend ; 

/ *mat rx ( in , out ) * / 
y,matrx (mm .  rinf  orm , rmat s)  ; 

/ *****mm9 . sas* ♦♦♦♦♦/ 

/*:(c**:(!***********j(t*=it*Mortality  Morbidity  Analysis  Software .  ♦*♦♦♦♦♦♦♦♦/ 

/♦**:*!  ^****:t!!t;*5t;**:)c5(c*j|c****s(!!)c****:t;*****!t:  ♦:*!!£*♦♦***♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦ 

♦♦♦The  main  driver  macro  named  mortmorb  requires  a  data  file  that  ♦♦♦ 


♦♦♦has  mortality  morbidity  data  in  the  following  form.  *♦♦ 
♦♦♦Time  to  disease  and  time  to  death  each  have  five  variables  *♦♦ 
♦♦♦associated  with  them.  Two  of  the  variables  are  censoring  ♦♦♦ 
♦♦♦indicators  and  the  other  three  are  the  available  information  ♦♦♦ 
♦♦♦about  the  time  to  event  (disease  or  death) .  *** 


♦♦♦The  three  variables  that  indicate  the  known  value  of  disease  are*** 
♦♦♦obst,  obstl  and  obstr.  Obstl  and  obstr  are  the  left  and  right  ♦♦♦ 
♦♦♦interval  endpoints  if  time  to  disease  is  doubly  censored.  That  ♦♦♦ 
♦♦♦is,  if  time  to  disease  is  known  only  to  fall  into  an  interval  *♦♦ 
♦♦♦then  that  interval  is  (obstl, obstr) .  *** 
♦♦♦In  this  case  obst,  which  is  the  observed  value  of  time  to  ♦♦♦ 
♦♦♦disease  is  ,  because  it  is  unobserved  and  considered  missing. ♦♦ 
♦♦♦If  time  to  disease  is  observed  or  singly  censored  then  the  value*** 
♦♦♦or  the  censored  value  is  in  obst,  while  obstl  and  obstr  will  be  ♦♦♦ 
♦♦♦’.’  .  The  censoring  indicators  for  time  to  disease  must  be  ♦♦♦ 
♦♦♦named  itL  and  itR.  ItL  =  1  if  time  to  disease  is  left  censored. ♦♦♦ 
♦♦♦That  is  if  max (time  to  disease, lower  censoring  variable)  =  ♦♦♦ 
♦♦♦lower  censoring  value.  In  this  case  we  observe  the  value  of  the*** 
♦♦♦censoring  variable.  ItL  =  0  if  the  maximum  is  equal  to  time  to  ♦♦♦ 
♦♦♦disease.  ItR  =  1  if  time  to  disease  is  right  censored.  That  is  ♦♦♦ 
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***if  minCtime  to  disease .upper  censoring  variable)  =  censoring  *** 
♦♦^variable .  In  this  case  we  observe  the  value  of  the  censoring  *** 

♦♦♦variable.  ItR  =  0  if  the  min  above  is  the  time  to  disease.  *** 

♦**If  ItL  =  0  and  ItR  =0  then  obst  is  the  actual  time  to  disease.  ♦** 

♦♦♦If  ItL  =  1  and  ItR  =0  then  the  actual  time  to  disease  is  less  *** 

♦♦♦than  obst.  *** 

♦♦♦If  ItL  =  0  and  ItR  =1  then  the  actual  time  to  disease  is  greater*** 
***than  obst .  *** 

***In  all  of  the  cases  above  obstL  and  obstR  will  be  ’ . ’  because  *** 
***none  of  these  cases  is  double  censored.  *** 

***If  ItL=l  and  ItR  =  1  then  obst  =  ’ . ’  and  obstl  <=  actual  disease*** 
***time  <=  obstr.  *** 

***If  death  occured  before  disease  then  obst,  obstl  and  obstr  are  *** 
♦♦♦all  and  Itl  =  0  and  ItR  =1.  This  indicator  pattern  is  used  *** 
♦♦♦because  technically  death  has  censored  disease  time  of  the  right.** 
***The  variables  related  to  time  to  death  are  IdL.IdR,  obsd,  obsdL  *** 
:(c**and  obsdR.  They  are  defined  similarly  to  those  for  disease.  *** 

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦  ♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦/ 
’/oglobal  numtimes;  /*contains  number  of  disease  and  death  times 

to  be  used  to  form  the  lattice  for  disease 
time  by  death  time.*/ 

’/.include  ’  c :  \windows\desktop\mortmorb\cellsgn2 .  sas  ’ ; 

’/.include  ’  c :  \windows\desktop\mortmorb\classif y .  sas  ’ ; 

’/.include  ’ c :  \windows\desktop\mortmorb\em. sas ’  ; 

’/.include  ’  c ;  \windows\desktop\mortmorb\intprob2 .  sas  ’ ; 

’/.include  ’  c :  \windows\desktop\mortmorb\time5two .  sas  ’ ; 

’/.include  ’  c :  \windows\desktop\mortmorb\realf  req .  sas '  ; 

’/.include  ’  c :  \windows\desktop\mortmorb\revzero .  sas  ’ ; 

’/.include  ’ c : \windows\desktop\mortmorb\newinf rm.  sas '  ; 

’/.macro  mm2  (datafile ,  outprobs ,  pseudo ,  disf  mt ,  dethf  mt ,  intimes , 
tol, its, libdrive, inform  ,pref ) ; 

/*:4:**:t!^^^^^^^^^^^^^^^^^************************************************ 

***Datafile  is  a  mortality  morbidity  file  with  itL,  itr  as  the  left  *** 
***cuid  right  censoring  indicators  for  time  to  disease,  obst  as  the  *** 
**^observed  time  to  disease,  which  is  either  the  actual  time  or  the  *** 
***right  or  left  censoring  time  depending  on  the  censorship.  If  ItL*** 
♦♦♦and  itR  are  both  1,  indicating  double  censoring  then  obst  is  *** 

♦♦♦and  obstl,  obstr  contain  the  left  and  right  endpoints  of  the  *** 

♦♦♦observed  interval  for  time  to  disease.  If  obst  is  because  *** 
♦♦♦death  occurred  before  disease  then  itl  =  0  and  itr  =  1.  IdL,  idr*** 
♦♦♦obsd,  obsdL,  obsdr  are  defined  similarly  for  the  time  to  death,  *** 
♦♦♦except,  of  course,  that  death  will  not  be  cesored  by  disease,  so  ♦** 
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♦♦♦death  time  should  never  be  missing.  ♦♦♦ 

♦♦♦outprobs  is  the  data  set  containing  the  final  em  algorithm  ♦♦♦ 

♦♦♦probability  estimates  for  the  cells  defined  by  the  formats  disfmt,^^^ 
♦♦♦for  rows  and  dethfmt  for  column,  saved  in  library.  ♦♦♦ 

♦♦♦These  formats  were  presumable  built  on  control  set  of  ♦♦♦ 

♦♦♦mortality/morbidity  data  using  the  macro  mortmorb.  ♦♦♦ 

♦♦♦pseudo  is  the  data  set  containing  the  final  pseudo  data  coimt  for^^^ 
♦♦♦the  cells  defined  by  diint  and  deint.  ♦♦♦ 

♦♦♦Intimes  is  the  data  set  containing  tl  -  t&numtimes  which  are  the  ♦♦♦ 
♦♦♦times  used  for  the  column  intervals.  The  row  intervals  are  made  ♦♦♦ 
♦♦♦up  of  endpoints  which  are  a  subset  of  intimes.  ♦♦♦ 

♦♦♦Tol  is  the  value  for  the  sum  of  differences  squared  between  ♦♦♦ 


♦♦♦the  successive  probability  steps  of  the  em  algorithm  that  will  ♦♦♦ 
♦♦♦stop  algorithm  if  the  sum  of  squared  differences  is  less  thcui  it.^^^ 


♦♦♦its  is  the  maximum  niimber  of  iterations  of  the  em  algorthm  that  ♦♦♦ 
♦♦♦is  allowed.  ♦♦♦ 
♦♦♦The  information  matrix  for  the  resulting  set  of  probabilities  is  ♦♦♦ 
♦♦♦in  a  data  set  named  inform  in  an  array  named  im  of  ♦♦♦ 
♦♦♦square  dimension  inf dim^infdim  where  infdim  is  the  number  of  ♦♦♦ 
♦♦♦nonzero  cell  probabilities  (the  upper  diagonal)  minus  1.  ♦♦♦ 
♦♦♦Inform  also  contains  the  final  probability  estimates  in  an  array  ♦♦♦ 
♦♦♦named  newpl-newp (infdim) .  In  addition  inform  contains  variables  ♦♦♦ 
♦♦♦numcols,  numrows,  and  infdim  which  are  the  number  of  columns,  ♦♦♦ 
♦♦♦rows  and  the  dimension  of  the  information  matrix.  Finally  inform^^^ 
♦♦♦contains  arrays  Nil-Ni (infdim)  and  Nj 1-Nj (inf dim) ,  which  contain  ♦♦♦ 
♦♦♦the  indices  in  the  matrix  of  rectangles  for  the  newp  array.  ♦♦♦ 
♦♦♦pref  is  one  character  prefix  to  be  given  to  data  sets  final  ♦♦♦ 
♦♦♦formats  for  death  and  time  to  onset  of  disease.  The  format  for  ♦♦♦ 
♦♦♦the  disease  intervals  will  be  fepref.dis  and  for  death  fcpref .deth.^^^ 


yorealfreq(&datafile,freqout  ,&disfmt,  fedethfmt,  felibdrive)  ; 

7oinitprob(freqout  jinitprob) ;  /♦♦freqout  the  outfile  from  realfreq. 

initprob  is  a  set  of  initial  probabilities  for 
the  em  cells. ♦♦/  /♦in  intprob2^/ 

yocellsgn(&intimes,&datafile,outmm,&numrows,&numcols,&dwod) ;  /♦  outmm 

is  the  output  which  is  the  mortmorb 
file  with  observed  and  censoring 
observations  classified  by  row  if 
disease  cind  column  if  death.  Numrow 
and  numcol  are  fenumrows  and  ftnumcols 
which  are  created  by  initprob. ♦/ 

/♦in  cellsgn2. sas^/ 
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%classify(outmin,outnnn,&dwod)  ; 

%em  (outimn ,  initprob ,  ftpseudo ,  feoutprobs ,  fenumrows ,  fenumcols ,  &t  otal ,  &:tol ,  &it  s 

,&dwod) ; 


/*  em  is  in  em.sas*/ 

%let  numtimes  =  7,eval(&numcols  -  1); 

°/onextzero (feoutprobs , feint imes , fenumrows , fenumcols , fenumt imes , newp , feoutprobs .times) ; 

/♦next  zero  is  in  revzero . sas*/ 

'/oif  7,eval(feyes)  =  1  "/otHen  7odo;  /*if  outprobs  and  times  have  changed 

♦♦then  we  rerun  the  em  process  with  the 
♦♦new  matrix  of  probs  and  array  of  times. ♦/ 
°/oput  Warning  Rectangles  have  been  combined; 

°/olet  numrows  =  fenewrows; 

%let  numcols  =  fenewcols; 

%let  total  =  fenewtot; 

“/oCellegnCtimes, fedatafile.outmm, fenumrows, fenumcols, fedwod)  ;  /♦times  is  the 

file  of 

timesfrom  eminvals.  datafile  is  the 
mortmorb  file,  outmm 
is  the  output  which  is  the  mortmorb 
file  with  observed  and  censoring 
observations  classified  by  row  if 
disease  and  column  if  death.  Numrow 
and  nvimcol  are  fenumrows  and  fenumcols 
which  are  created  by  initprob. ♦/ 

'/classif y (outmm, outmm, fedwod)  ; 

7olet  numtimes  =  fenewtime; 

7«em  (outmm ,  feoutprobs ,  fepseudo ,  feoutprobs ,  fenumrows ,  feniimcols ,  fetotal ,  fetol ,  feit  s , 

fedwod) ; 


7.end; 

7oddf  ormat  (times ,  fenumt  imes  .fenumrows  .fenumcols  ,fepref)  ; 

/♦♦This  was  moved  from  within  the  above  loop 
to  accomodate  a  change  in  times  from  the  proc  freq 
as  well  as  from  the  nextzero^^/ 

data  fepref. times; 
set  times; 
rim; 

7oeminfo (fepseudo, fenumrows, fenumcols, feinform)  ;/^in  newinfrm. sas^/ 

7omend ; 

/  ♦mortmorb  (dataf  ile ,  outprobs ,  pseudo ,  intobs ,  t  ol ,  it  s ,  libdrive ,  pref )  ♦/ 
/♦7omortmorb  (mm .  cancer ,  f nlprobs ,  f nlpseu ,  20 , .  0001 , 100 , 

c : \windows\deskt op\mortmorb , inf  orm , p) ; ♦/ 
/♦♦newinform. sas  3/30/98^^^/ 


7, macro  eminfo(pseudata,nrows,ncols, inform)  ; 

***This  macro  takes  final  em  contributions  for  *** 
***a  set  of  rectangles  defined  by  a  matrix  with  *** 
***nrows  intervals  for  the  vertical  axis  *** 

♦♦♦and  ncols  intervals  for  the  horizontal  aocis.^^^ 
♦♦♦The  vertical  axis  corresponds  to  time  to  ♦♦♦ 
♦♦♦onset  of  disease.  The  first  row  is  the  event  ♦♦♦ 
♦♦♦of  no  disease  onset  before  death.  Technically^^^ 


♦♦♦then  disease  onset  time  is  known  only  to  have  ♦♦♦ 
♦♦♦occured  after  death.  The  colximns  are  ♦♦♦ 

♦♦♦intervals  for  death  times.  The  data  can  be  ♦♦♦ 

♦♦♦looked  at  as  pseudo  counts  for  cells  of  a  ♦♦♦ 

♦♦♦matrix  with  rows  being  disease  onset  times  and^^^ 
♦♦♦columns  being  death  times.  ♦♦♦ 

♦♦♦These  pseudo  counts  are  expected  values  for  ♦♦♦ 
♦♦♦the  actual  cell  in  which  the  observation  falls^^^ 
♦♦♦given  the  partial  information.  ♦♦♦ 

♦♦♦For  each  observations  the  cell  counts  must  ♦♦♦ 
♦♦♦add  to  one.  ♦♦♦ 

♦♦♦Using  Loius(1982)  these  pseudo  counts  are  used^^^ 
♦♦♦to  estimate  the  information  matrix  for  the  ♦♦♦ 
♦♦♦MLE’s  of  the  the  probabilities  of  the  cells,  ♦♦♦ 
♦♦♦which  were  obtained  from  the  em  algorithm.  ♦♦♦ 
♦**Pseudata  also  contains  the  final  estimates  of  ♦♦♦ 
♦♦♦the  cell  probabilities.  The  pseudo  data  must  ♦♦♦ 
***be  in  a  SAS  array  ps(&nrows,&ncols) .  ♦♦♦ 

♦♦♦The  final  probabilities  must  be  in  a  SAS  array^^^ 
♦♦♦np((&nrows,&ncols) .  The  information  ♦♦♦ 

♦♦♦matrix  will  be  in  an  array  called  ♦♦♦ 

♦♦♦im((s,s)  where  s  is  the  number  of  possible  ♦♦♦ 
♦♦♦nonzero  rectangles  minus  1.  The  minus  1  ♦♦♦ 

♦♦♦reflects  the  fact  that  the  ♦♦♦ 

♦♦♦probabilities  must  add  to  1.  The  inf ormation^^^ 
♦♦♦matrix,  the  vector  of  nonzero  probabilities  ♦♦♦ 
♦  ♦♦and  the  number  of  rows  and  coliimn  in  the  ♦♦♦ 

♦♦♦disease  by  time  matrix  and  the  number  of  ♦♦♦ 

♦♦♦nonzero  probabilities  will  be  in  the  data  set  ♦♦♦ 
♦♦♦inform  in  im(infdim,infdim) ,nps(infdim+l)  and  ♦♦♦ 
♦♦♦numrows  numcols  and  inf dim.  ♦♦♦ 

♦♦♦The  data  set  inform  will  also  contains  two  ♦♦♦ 

♦♦♦other  arrays  ni{inf dim+l}  and  nj-Cinf dim+l} .  ♦♦♦ 
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***ni(s)  and  nj (s)  will  the  the  disease  and  death*** 
***intervals  for  the  original  matrix  for  the  *** 
***newp(s)  the  sth.  rectangle  probability  in  the  *** 
***upper  diagonal  matrix,  newp(s)  appears  in  the*** 
***inform  data  set  also.  *** 

***For  example,  letting  invim{infdim, infdim}  be  *** 

***the  inverse  of  the  information  matrix,  im,  the*** 
***entry  invim(s,t)  is  the  estimate  of  the  *** 

***covariance  of  newp(s)  and  newp(t),  which  *** 

***in  terms  of  the  original  matrix  of  rectcingles  *** 

***is  the  covariance  between  np(ni(s) ,nj (s) )  and  *** 
***np(ni(t) ,nj  (t) )  .  *** 

***The  null  data  step  sets  the  total  number  of  *** 
***nonzero  disease  time  by  death  time  cells.  *** 

***The  first  row,  which  is  the  row  of  death  times*** 

***for  death  before  disease  has  no  impossible  *** 
***cells.  The  matrix  formed  by  the  disease  by  *** 
***death  intervals  minust  the  first  row  is  upper  *** 
**!*triangular  in  the  sense  that  the  cells  below  *** 

***the  diagonal  correspond  to  death  before  *** 

***disease  with  disease  time  known — this  can  not  *** 
***happen  so  these  cells  have  zero  probability.  *** 

***The  macro  variable  total  below  is  the  niomber  *** 

*=K*of  cells  in  the  first  row  plus  the  number  of  *** 
***upperdiagonal  and  diagonal  cells  for  the  rest  *** 

***of  the  disease  by  death  matrix.  *** 

***Since  the  cell  probabilities  must  add  to  1  the*** 
***dimension  of  the  vector  of  cell  probabilities  *** 

***is  total  -1,  which  is  put  into  the  macro  *** 

***variable  infdim.  The  information  matrix  for  *** 

***the  cell  probabilities  is  the  square  of  infdim*** 
***which  is  put  into  the  macro  variable  inf size.  *** 

data  _null_; 

total  =  &nrows*&ncols  -  (&nrows-2)*(&nrows  -l)/2.0  ; 
infdim  =  total  -  1; 

inf size  =inf dim*infdim;  /**number  of  entries  in  info  mat*/ 
call  symputC 'total ^ , left (trim (total))) ; 
call  symput  ( ’  infdim’ ,  left  (trimdnf dim)  )  )  ; 
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call  sjmiput  ( ’  inf  size  ’ ,  left  (trim(inf  size)  )  )  ; 

stop; 

run; 

/♦*=i=The  arrays  ni  and  nj  hold  the  row  and  column  niimbers  for  the** 
****vector  of  the  upper  diagonal  pseudo-data  and  probabilities.  ** 
=i!*:**These  matrices  are  stretched  out  in  the  code  to  allow  the  ** 
****Louis  method  to  be  used.  Ni  and  nj  will  allow  a  map  back  to** 

****the  nonzero  cells  of  the  disease  death  matrix.  ** 

data  rowcol ; 

array  ni{&total};  /*holds  the  row  for  the  stretched  out  matrix*/ 
array  nj{&total};  /*holds  the  collum  for  the  stretched  matrxi*/ 
capj  =  fencols; 
capi  =  fenrows  -1 ; 

do  j j  =  1  to  capj ;  /*This  do  block  puts  the  death  before  disease*/ 

ni(jj)  =  1; 

iij(jj)  =  jj: 

end; 

do  i  =  1  to  capi; 
do  j=  i  to  capj ; 
s  =  i*capj  - 

ni(s)  =  i+1; 
nj(s)  =  j; 

end; 
end; 
output ; 
stop; 

drop  jj  i  j  s  capi  capj; 
rm; 

data  newpseud; 
set  fepseudata; 

array  np(&nrows ,&ncols)  ;  /*contains  cell  probs  including  zeros.*/ 
array  ps(&nrows,&ncols) ;  /*contributions  to  each  cell.*/ 
array  nps-[&total};  /*vector  of  psuedodata  for  the  possible  cells*/ 
array  newp{&total};/*vector  of  cell  probs  for  the  possible  cells*/ 


/*This  do  block  assigns  the  upper  triangular*/ 
/*portion  of  the  possible  cells.  */ 

((i-l)*i/2.0)  +j; 


capj  =  fencols; 
capi  =  fenrows  -1; 

do  jj  =  1  to  capj;  /*This  do  block  puts  the  death  before  disease*/ 

nps(jj)  =  ps(l,jj);  /*probs  and  contributions  into  the  new  vectors*/ 
newp(jj)  =  npClJj) ; 

end; 

/*This  do  block  assigns  the  upper  triangular*/ 
do  i  =  1  to  capi;  /*portion  of  the  possible  cells.  */ 

do  j=  i  to  capj ; 

s  =  i*capj  -  ((i-l)*i/2.0)  +j ; 
nps(s)  =  ps(i+l, j) ; 
newp(s)  =  np(i+l, j) ; 

end; 

end; 

keep  npsl-nps&total  newpl-newp&total  ; 
run; 

/**Since  the  vector  of  probabilities  must  add  to  1,  the  vector  of*** 
***probabilities  used  for  the  information  matrix  is  of  length  *** 
**:*total  -  1 .  *** 

data  feinform; 

set  newpseud  end=last; 

axray  nps{&total}; 

array  newp-C&total} ; 

array  inf o{&infdim,&infdim} ; 

array  im{&infdim,&infdim}; 

do  s  =  1  to  ftinfdim; 

do  t  =  1  to  feinfdim; 

info(s,t)  =  (nps(s)/newp(s))  -  (nps(&total)/newp(&total)) ; 
info(s,t)  =  info(s,t)*((nps(t)/newp(t))  -  (nps(&total)/newp(&total))) ; 
im(s,t)  +  info(s,t); 
end; 

end; 

if  last  then  do; 
numcols  =  fencols; 
numrows  =  fenrows  ; 
inf dim  =  feinfdim; 
end; 
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if  last; 

keep  iml-im&infsize  newpl-newp&total  numrows  numcols  infdim; 
run; 

data  feinform; 

merge  feinform  rowcol; 

rim; 

7,mend ; 

/*eminf o(pseudata,nrows,ncols,inf orm)  */ 

/ ***prepj ont . sas***/ 


***This  code  produces  tables  of  the  joint  probability  *** 
***estimates  along  with  their  stamdaxd  deviations.  *** 
***It  also  fills  out  the  covariance  matrices  for  the  *** 
***two  groups,  in  preparation  for  relative  risk  and  *** 
♦♦^lethality  calculations.  Finally  it  does  a  chi-square  *** 
***test  on  the  two  sets  of  joint  probabilities  and  prints  *** 
♦**out  the  p-values  and  the  two  joint  probability  *** 
***estimates .  *** 


“/.include  ’  c :  \windows\desktop\mortmorb\matrx .  sas  ’ ; 

*/, include  ’  c :  \windows\desktop\mortmorb\estcov .  sas  ’ ; 

’/.include  ’  c :  \windows\desktop\mortmorb\f  illout .  sas  ’ ; 

'/.include  ’  c :  \windows\desktop\mortmorb\quaddif  f  .  sas ! ; 

‘/.matrx(mm.rinf  orm,rmats) ; 

’/.matrx(mm. cinf orm, cmats)  ;  /**Puts  information  matrices  into  matrix  form.*/ 
‘/.estcov(rmats  ,rcov) ; 

'/.estcov(cmats,ccov)  ;/*creates  covariance  matrix  from  information.*/ 
’/.pstdtab(rcov,rrow,rcolumn,  10,8.6,Exposed  Joint  Probabilities); 
'/.pstdtab(ccov, crow, ccolumn, 10, 8. 6, Comparison  Joint  Probabilities)  ; 

’/.f  illcov(rcov,rfilld) ; 

’/.f  illcov(ccov,cfilld) ;  /*puts  out  the  complete  covariance  matrix 

for  all  probabilities.*/ 

’/.quaddif  f  (rf  illd ,  cf  illd , rrow , rcolumn ,  j oint  probs , Exposed , Controls) ; 

/*Compares  the  joint  probability  distributions*/ 
/=i!**quaddif f .  sas  6/3/98  */ 

'/.macro  quaddif  f  (f  ullcovl ,  f  ullcov2 ,  rowf mt ,  coif  mt , measure ,  labl ,  lab2)  ; 

/.t!*********  ***************************************  ************ 

***This  macro  computes  the  quadratic  difference  between  two*** 

***sets  of  probability  vector  estimates  using  their  *** 

***estimated  covariance  matrices.  *** 
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***Both  fullcovl  and  fullcov2  are  data  sets  containing  the  *** 
rows  and  columns  of  a  covciriance  matrix  in  variables  *** 
***nained  covl  through  covk.  They  also  contain  a  variables  *** 
^^^named  p,  row  and  column.  The  variable  named  p  contains  *** 
***the  first  k  of  k+1  probabilities.  The  variables  row  and*** 


***column  conain  the  row  and  column  number  of  the  original  *** 
***matrix  of  rectangles  for  the  probability  in  p.  The  row  *** 
***and  column  values  should  be  the  same  for  both  fullcovl  and*** 
***fullcov2,  since  both  data  set  refer  to  the  same  set  of  *** 
***rectangles .  Letting  iml  and  pi,  im2  and  p2  be  the  *** 

***matrix  and  probability  vector  for  fullcovl  and  2  resp.  *** 
***The  output  will  be  the  quadratic  form  of  pl-p2  with  the  *** 
***sum  of  the  inverses  of  iml  cind  im2.  *** 

***The  value  will  be  in  a  data  set  called  quadiff.  *** 

***Measure  is  the  parameter  being  testd,  e.g.  lethality,  *** 
***labl  and  lab2  that  identify  the  groups.  *** 


data  _null_; 

set  fefullcovl  nobs=n; 

dim  =  n; 

call  symput ( ’ dim ’ , compress (dim) ) ; 

stop; 

run; 

data  _covl_; 
set  fefullcovl; 
keep  covl-cov&dim; 
run; 

data  _pl_; 
set  fefullcovl; 
keep  p; 
rim; 

data  _rcl_; 
set  &fullcovl; 
keep  row  column; 
run; 

data  _rc2_; 
set  &fullcov2; 
row2  =  row; 
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coluiim2  =  column; 
keep  row2  column2; 
run; 

/♦★^Testing  to  be  sure  row  and  column  numbers  are  the  same  in*** 
****both  data  sets.  **/ 

data  rowscols; 
merge  _rcl_  _rc2_  end=last; 
rowdiff  =  row  -  row2; 
coldiff  =  column  -  column2; 
sumrowd  +  rowdiff; 
sumcold  +  coldiff; 
if  last  then  do; 

if  sumrowd  ne  0  then  do; 

put  ’  WARNING  ROW  NUMBERS  ARE  NOT  THE  SAME.’; 

end; 

if  sumcold  ne  0  then  do; 

put  ’  WARNING  COLUMN  NUMBERS  ARE  NOT  THE  SAME.’; 

end; 

end; 

run; 

data  _cov2_; 
set  &fullcov2; 
keep  covl-cov&dim; 
run; 

data  _p2_; 
set  &fullcov2; 
keep  p; 
rim; 

proc  iml; 

/=i=***converting  fullcovl  to  matrix  and  vector  form****/ 
use  _covl_; 

read  all  var  _num_  into  covl; 
use  _pl_; 

read  all  var  {p}  into  pi ; 

/*print  infl  pi;*/ 

/****converting  fullcov2  to  matrix  and  vector  form****/ 
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use  _cov2_; 

read  all  var  _nuin_  into  cov2; 
use  _p2_; 

read  all  vax  {p}  into  p2; 

/*  print  inf 2  p2;*/ 

/^computing  quadratic  difference*/ 
covl2  =  covl  +  cov2; 

eig  =  eigval(covl2) ;  /*eig  is  the  vector  of  Eigenvalues*/ 
/*print  eig;*/ 

create  eig  from  eig[colname  =  ’eig’]; 
append  from  eig;  /*eigen  values  now  in  data  set  named  eig.*/ 
invcovl2  =  ginv(covl2); 

quadiff  =  (pl-p2) ' *invcovl2* (pl-p2) ; 

create  quadiff  from  quadiff ; 
append  from  quadiff ; 

/*  print  quadiff;*/ 
quit ; 

/*******Puts  vector  of  estimates  in  one  dataset****/ 

data  _pl_: 
set  _pl_  ; 
pi  =  p; 
drop  p; 
run; 

data  _p2_; 
set  _p2_  ; 

p2  =  p; 

drop  p; 
run; 

data  bothprob; 

merge  _rcl_  _pl_  _p2_; 

run; 

data  _null_; 
set  quadiff ; 

call  symput  (’quadiff ’ ,compress (coll)) ; 
stop; 
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rmi; 

/*****The  next  data  step  computes  the  p-value  for  the  quadratic  *** 
******diff erence  for  the  null  hypothesis  of  no  difference  betwee*** 
****=i'*the  two  joint  probabilities.  **/ 

/*+first  count  number  of  nonzero  eigen  values  to  get  degrees  of  freedom.*/ 
data  eignonO; 
set  eig; 

if  eig  >  .000000000000000001;  /**note  this  is  how  we  define  zero**/ 
run; 

data  _null_; 

set  eignonO  nobs  =  ordr; 

call  symput ( ’ ordr ’ , compress (ordr) ) ; 

stop; 

run; 

data  _null_; 

pvalue  =  1.0  -  probchiC&quadiff ,&ordr) ; 
call  symput (’pvalue’ , compress (pvalue) ) ; 
stop; 
run; 

proc  print  data  =  bothprob  noobs; 
var  row  col\imn  pi  p2; 

titlel  "The  quadratic  difference  is  fcquadiff."; 

title2  "It  has  feordr  degrees  of  freedom  and"; 

titles  "p-value  =  ftpvalue"; 

title4  "PI  is  femeasure  from  &labl"; 

titles  "P2  is  toneasure  from  &lab2" ; 

format  row  ferowfmt . .  column  fecolfmt..; 

run; 

'/.mend ; 

/ ♦♦♦♦example***/ 

/**Use  in  the  following  way.***/ 

/ **'/oinclude  ’  c :  \windows\desktop\mortmorb\matrx .  sas  ’ ; 

°/oinclude  ’  c :  \windows\deskt op\mortmorb\f  illout .  sas  ’ ; 

'/oinclude  ’  c :  \windows\desktop\mortmorb\estcov .  sas  ’ ; 

y.matrx(mm.rinform,rmats) ;  In  matrx.sas 
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°/oinatrx(nim. cinf orm.cmats)  ;  These  two  calls  put  the  information 

matrices  for  the  k-1  estimates  and  the 
k-1  estimates  into  matrix  and  vector 
form. 

y.estcov(rmats ,rcov) ;  In  estcov.sas. 

'/oestcovCcmats.ccov) ;  inverts  the  info  matrix  and  keeps  the  k-1 

probability  estimates. 

'/of illcov(rcov,rf illd)  ;  In  fillout.sas 

‘/of  illcovCccov, cf  illd)  ;  Gives  the  complete  covariance  matix  and 

for  the  K  probabilities  and  give  the  K 
probability  estimates  in  the  correct  format 
for  use  with  quaddif f . 

'/.quaddif  f  (rf  illd ,  cf  illd ,  rrwo ,  rcolumn ,  j  oint  probs ,  Exposed ,  Controls)  ;  */ 
‘/omacro  ratiostdCparl ,par2,sigl,sig2,cov,outf  ile)  ; 

/*********  *****^^=f:****:|t=)t****5|t******>|t!|c!):!(c:(t!|c***********!|= 

***This  macro  computes  the  as3nntotic  standard  deviation  of*** 


***the  ratio  parl/par2  as  well  as  the  ratio  itself.  *** 
***Parl,  par2  axe  the  values  of  the  estimators  and  *** 
***sigl  and  sig2  are  the  variances  of  pari  and  par2,  *** 
^♦♦respectively.  Cov  is  the  covariance  of  pari  and  par2.  *** 
***The  computation  assiimes  that  the  vector  (parl,par2)  is  *** 
♦^♦approximately  normal  (in  the  aymtotic  sense)  with  *** 
♦♦♦covariance  matrix  given  by  sigl,sig2  cov.  The  varicuice^^^ 
♦♦♦formula  is  derived  from  the  delta  rule.  *** 
♦♦♦Outfile  contains  one  observation  with  variable  ratio  *** 
♦♦♦and  stdev,  which  are  parl/par2  and  the  standard  *** 
♦♦*deviation  of  the  ratio.  Macro  veiriables  by  the  same  *** 
***name  also  contian  these  value.  *** 


♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦/ 
‘/oglobal  ratio  stdev; 

‘/.local  outfile; 
data  feoutfile; 
pi  =  feparl; 
p2  =  &pax2; 
si  =  fesigl; 
s2  =  &sig2; 
c  =  &cov; 
output ; 
stop; 
run; 
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data  ftoutfile; 
set  &outfile; 
ratio  =  pl/p2; 

stdev  =  sl*sl  -  2*c*ratio  +  s2*s2*ratio*ratio ; 

stdev  =  stdev/ (p2*p2) ; 

stdev  =sqrt (stdev) ; 

call  symputC’ ratio’ , compress (ratio)) ; 

call  sjrmput  ( ’  stdev  ’ ,  compress  (stdev)  )  ; 

keep  ratio  stdev; 

run; 

/* 

'/.put  The  ratio  is  feratio; 

'/.put  The  standard  deviation  is  festdev;  */ 

'/.mend ; 

/*  ratiostd(parl,par2,sigl,sig2,cov,outfile) ;*/ 
/***realfreq.sas  3/30/98***/ 

'/.macro  realfreq(inf ile,outfile,disfmt .dethfmt .libdrive) ; 


***This  macro  takies  an  file  named  in  infile  that  *** 
***contains  variables  obst,  obsd,  idr,  idL,  itr,,itL.  *** 
♦**obst  and  obsd  are  the  observed  times  for  time  to  *** 
**:t!disease  and  time  to  death,  respectively.  *** 
***Idr  and  idL  are  indicators  that  indicate  whether  *** 


***time  to  death  is  censored  on  the  right  and/or  left,  *** 
♦^^respectively .  An  indicator  equal  to  1  indicates  ♦*♦ 
♦♦♦censored.  Itr  and  itL  axe  the  censoring  indicators  ♦♦♦ 


*=i=*for  time  to  disease.  The  outfile  will  contain  ♦♦♦ 
♦*=t:proc  freq  output  for  the  cells  with  rows  for  disease^^^ 
♦♦♦and  columns  for  the  death  variable.  ♦♦♦ 
♦♦♦The  proc  freq  output  will  be  for  the  uncensored  ♦♦♦ 
♦♦♦disease  and  death  pairs,  including  those  where  ♦♦♦ 
♦♦♦death  is  known  and  disease  is  known  not  to  have  ♦♦♦ 
♦♦♦occurred  during  the  lifetime,  categorized  in  ♦♦♦ 
♦♦♦rectangles  defined  by  formats  disfmt  for  the  rows  ♦♦♦ 
♦♦♦and  dethfmt  for  the  cols.  ♦♦♦ 


libname  library  "felibdrive"; 

/♦♦♦♦The  next  data  step  keeps  only  the  uncensored  pairs  and  ♦♦♦ 
♦♦♦♦♦those  pairs  with  uncensored  death  time  and  disease  time^^^ 
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*sc***kiiown  to  have  occurred  after  death.  ♦+/ 

data  actual; 
set  ftinfile; 
sumd  =  idr  +  idl; 
sum  =idr  +  idl  +  itr  +itl; 
if  sum=0  or 

(obst  =  .  and  sumid  =  0) ; 

keep  obst  obsd; 
run; 

proc  freq  data  =  actual  Z+noprint*/; 

title  ’Uncensored  disease  by  death  counts’; 

format  obst  fedisfmt..  obsd  &dethfmt..; 

table  obst*obsd/missing  sparse  out=  feoutfile; 

run; 

7, mend ; 

Z^^’t'This  program  computes  the  relative  risk  ratio  for  all  rectangles  in  *** 
****the  matrix  of  time-to-onset  by  time-to-death  matrix.  One  needs  to  *** 
^^♦♦run  fixboth.sas  to  get  rinform  and  cinform,  the  information  matrices*** 
****and  probability  estimates  along  with  row  and  column  indices  for  *** 
****Exposed  and  controls.  The  cov  matrices  are  obtained  by  the  *** 
****f allowing  sequence. 

7.matrx  (mm .  rinf  orm ,  rmat s)  ; 
y,matrx (mm .  cinform, cmats)  ; 
y,estcov(rmats,rcov) ; 
yestcov (cmats jCcov) ; 
y,f  illcov(rcov,rf  illcov)  ; 

7of  illcov(ccov,cfillcov)  ; 

data  sets  fenumcov  and  fedencov  contain  the  complete  covariance*** 
**=c=t:matrices , probability  estimates  and  row  column  indicators  in  *** 

****variables  covl-covK,p,  and  row  column  in  K  observations.  *** 

/****This  macro  requires  that  the  row  and  coliunn  indices  have  formats  *** 
*****rrow  and  rcolumn  for  the  original  rectangle  row  and  column  *** 

*****def initions .  These  are  outputs  from  the  fixboth  macro  and  should  *** 
*****be  the  same  as  formats  crow  and  ccoliimn.  *** 

ymacro  relrisk(numcov,dencov, title) ; 

/***The  following  code  gets  the  dimension  of  the  pvector  for  the  two*** 


/*disease  euid  death  times  known*/ 

/*death  known  to  have  occurred 

before  disease  and  death  time  known*/ 
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****groups  and  tests  to  see  if  they  are  equal.  *** 

data  _null_; 

set  ftnumcov  nobs=n; 

call  S3niiput(’rdim’ .compress (n))  ; 

stop; 

rim; 

data  _null_; 
set  fedencov  nobs=n; 
if  n  =  fcrdim  then  do; 
end  =  0; 

call  symput ( ’ end ’ , compress (end) ) ; 
stop; 

end; 

else  do; 

end  =■  1 ; 

call  symput (’end’ .compress (end)) ; 
stop; 

end; 

rim; 

‘/,if  &end  =  1  7, then  y,do; 

y.put  The  dimensions  of  the  two  probability  vectors  are  unequal.; 
y,put  We  cannot  compute  the  requested  table . ; 

'/.end; 

‘/.else  ’/.do; 

data  rcovp; 
set  fenumcov; 
array  cov{&rdim}; 
array  rcov{&rdim}; 

‘/.do  i  =  1  ’/.to  ferdim; 

rcov(&i)  =  cov(&i) ; 

‘/.end ; 

rp  =  p; 

rrow  =  row; 
rcolumn  =  column; 

keep  rp  rrow  rcolumn  rcovl-rcov&rdim; 
run; 
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data  ccovp; 
set  fedencov; 
array  cov{&rdim}; 
array  ccov{&rdini}; 

'/odo  i  =  1  “/oto  ferdim; 

ccov(&i)  =  cov(&i); 

'/.end; 

cp  =  p; 
crow  =  row; 
ccolumn  =  column; 

keep  cp  crow  ccolumn  ccovl-ccov&rdim; 
run; 


data  bothcovp; 
merge  rcovp  ccovp; 
run; 

/***testing  to  be  sure  the  row  and  column  indices  match.***/ 

data  _null_; 
set  bothcovp  ; 

if  rrow  ne  crow  or  rcolumn  ne  ccolumn  then  do; 
end  =  1; 

call  symput ( ’ end ’ , compress (end) )  ; 
stop; 

end; 

rim; 

'/.if  &end  =  1  '/.then  '/.do; 

'/.put  The  row  and/or  column  indices  for  the  two  vectors; 
'/.put  Do  not  match.  ; 

'/.put  We  must  stop  here.; 

'/.end; 

"/.else  "/.do; 

data  _ratstd_; 
set  bothcovp; 
array  rcov{&rdim}; 
array  ccov{&rdim}; 
relrisk  =  rp/cp; 
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sigl  =  rcov(_n_); 
sig2  =  ccov(_n_); 

var  =  (sigl  +  sig2*relrisk*relrisk)/(relrisk*relrisk) ;  /fusing  delta  rule*/ 

stdev  =  sqrt(var); 

row  =  rrow; 

column  =  rcolumn; 

keep  relrisk  stdev  row  column; 

rim; 

/***In  here  we  should  use  the  table  macro**/ 

“/oinclude  ’  c ;  \windows\desktop\mortmorb\tabmac .  sas  ’ ; 
y.tableit  (_ratstd_  ,row,  coliimn, 

rrow , rcolumn ,10,8.6, relrisk  stdev , t ime-t o-onset , 
time-to-death,&title) ; 

'/ofind; 

'/.end; 
y,mend ; 

/*example*/ 

/* 

'/.include  ’  c :  \windows\deskt op\mortmorb\matrx .  sas  ’ ; 

'/.include  ’  c :  \windows\desktop\mortmorb\estcov .  sas  ’ ; 

'/.include  ’  c :  \windows\deskt op\mortmorb\f  illout .  sas  ’ ; 

'/.mat rx  (mm .  r  inf  orm ,  rmat  s ) ; 

'/.matrx(mm.  cinform,cmats) ; 

'/.estcov(rmats,rcov) ; 

'/.estcov(cmats,ccov)  ; 

'/,f illcov(rcov,rf illcov)  ; 

'/.f  illcov(ccov,cf  illcov) ; 

'/,relrisk(rf illcov, cf illcov, Relative  Risk  Exposed  to  Controls);  */ 
/****revzero . sas  3/30/98**/ 

'/.macro  nextzero  (probs , times  ,numrows  ,numcols  ,numtimes , 

probpref ,newprobs ,newtimes) ; 

***This  macro  checks  to  see  if  the  oldrows  by  oldcols  *** 

***matrix  of  probabilities  in  data  set  probs  has  any  *** 

***zero  cells  in  the  first  row  or  in  the  diagonal  and  *** 

***upper  diagonal  of  the  lower  oldrows  -1  by  oldcols  *** 

***matrix.  Probpref  is  the  prefix  for  this  array  of  *** 

***probabilities .  times  contains  the  array  of  endpoint*** 

***for  the  time  intervals  with  prefix  t.  *** 
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***  As  soon  as  a  zero  cell  is  found  the  *** 

***matrix  is  collapsed  by  combining  the  column  with 
***the  zero  entry  with  an  adjacent  column.  If  there  *** 

***is  a  choice  of  columns  to  merge  with  the  zero  cell  *** 
***column  the  column  whose  cell  with  the  same  row  as 
***the  zero  cell  with  the  smallest  probability  is  used*** 

***to  combine  with  the  zero  cell.  After  collapsing  *** 

***the  process  of  finding  zero  cells  and  combining  *** 
***columns  is  continued  until  the  resulting  matrix  has*** 

***ho  zero  cells  in  the  first  row  or  diagonal  or  upper*** 
***diagonal  of  the  matrix  below  the  first  row.  *** 

***The  new  probability  matrix  is  in  a  data  set  newprobs*** 

***with  prefix  newp.  The  new  number  of  rows  and  *** 

***columns  are  in  global  macro  variables  named  newrows*** 

**=itand  newcols.  The  dataset  newtimes  contains  the  *** 

***new  times  that  form  the  endpoints  for  the  row  and  *** 
***co1uiTm  intervals  in  an  array  t.  The  new  number  *** 

♦**of  endpoints  is  in  the  macro  variable  newtme.  *** 

***oldtimes  is  the  number  of  endpoints  used  for  the  *** 
♦^♦disease  and  death  intervals.  *** 

***A  global  macro  variable  yes  is  produced.  Yes  =  1  *** 

***if  a  zero  cell  was  found  and  thus  collapsing  done.  *** 

***Yes  =  0  if  no  collapsing  was  done.  If  Yes  =  0  then*** 

***the  output  matrix  of  probabilities  and  the  output  *** 

♦**array  of  times  is  the  same  as  the  input.  *** 

data  _null_; 

oldtot  =  &numrows*&numcols; 
oldrows  =  fenumrows; 
oldcols  =  &numcols; 
oldtimes  =  fenumtimes; 

call  symput (’oldtot ' .compress (oldtot)) ; 
call  sjraiput ( ’ oldrows ’ , compress (oldrows) ) ; 
call  symput ( ’ oldcols ’ , compress (oldcols) ) ; 
call  symput (’oldtimes’ .compress (oldtimes)) ; 
stop; 
run; 

data  _probs_;  /**Putting  original  rectangle  probs  into  _probs_*/ 
set  feprobs; 

array  feprobpref .{feoldrows.&oldcols};  /*probpre  is  the  prefix  for 

the  cell  probs . */ 
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array  oldp{&oldrows,&oldcols}; 
do  i  =  1  to  feoldrows; 
do  j=l  to  ftoldcols; 

oldp(i,j)  =  fcprobpref . (i, j) ;  /^Renaming  the  old  probs . */ 

end; 

end; 

keep  oldpl-oldp&oldtot ; 
run; 

data  _probs_;  /**Putting  original  times  into  _times_.*/ 

/*and  creating  working  data  set*/ 
merge  _probs_  fetimes; 
array  t{&oldtimes}; 
array  oldt{&oldtimes}; 

do  k  =  1  to  feoldtimes;  /♦♦renaming  original  times**/ 
oldt(k)  =  t(k); 
end; 

drop  tl-t&oldtimes ; 
run; 

'/.global  newrows  newcols  newtot  newtime  yes; 

’/.let  redo  =  1; 

'/.let  first  =  1;  /*This  indicates  that  we  are  starting 

♦♦the  first  pass  at  checking  for  zeros.*/ 

’/.do  ’/.while  (&redo=l)  ;  /*continue  searcing  process  if  redo  is  1.*/ 

/♦Redo  changes  to  0  when  no  upper  diagonal 
♦♦zero  cells  are  left.**/ 

/*j(t*:(c.(!*****  ♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦  ♦♦’•'♦♦♦ 

following  data  step  find  the  first  zero  cell  if  any*** 
****and  puts  the  row  and  column  nximber  of  the  cell  in  macro*** 
♦♦♦♦variables  row  and  col.  *** 

*♦♦*♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦/ 
data  _null_; 
set  _probs_; 

array  oldp(&oldrows,&oldcols) ; 
first  =  ftifirst; 

do  j=  1  to  feoldcols;  /♦♦♦checking  first  row**/ 

if  oldp(l,j)  =  0  then  do; 
i=l; 

call  symput ('row’ .compress (i)) ; 
call  symputC’col’ ,compress(j)) ; 
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if  first  then  do; 
yes  =  1; 

call  symput (’yes' .compress (yes)) ; 
first  =  0; 

call  symput ( ’ f irst ’ .compress (first) ) ; 
end; 

stop; 

end; 

end; 

do  i=2  to  feoldrows;  /^checking  lower  sumatrix*/ 

do  j=  i-1  to  ftoldcols;  /*0nly  the  diagonal  and  upper  diagonal.*/ 

if  oldp(i,j)  =  0  then  do; 

call  symput (’row’ .compress (i)) ; 
call  symput (’ col’ .compress(j)) ; 
if  first  then  do; 
yes  =  1; 

call  symput (’yes’ .compress (yes)) ; 
first  =  0; 

call  symput (’first’ .compress(f irst) ) ; 
end; 

stop; 

end; 

end; 

end; 

redo  =  0;  /*We  only  get  here  if  no  zeros  foiind  in  which  case 

**  this  is  the  last  search.*/ 

if  first  then  do; 

yes  =  0; 

call  symput (’yes’ .compress (yes))  ; 
end; 

call  symput ( ’ redo ’ . compress (redo) ) ; 

stop; 

rim; 

***We  begin  combining  column  col  with  column  col-1  or  col+1.*** 

***Keep  in  mind  that  the  interval  for  column  col  has  *** 

***endpoints  times(col-l)  and  times(col+l) .  The  *** 

***corresponding  row  if  oldrows  >=  col+1  is  row  equal  col+1.*** 
***Regardless  of  which  columns  are  collapsed  oldcols  will  be*** 
***reduced  by  one ,  the  times  will  be  reduced  by  1 .  The  *** 
***number  of  rows  will  be  reduced  by  1  if  oldrows>col+l .  *** 
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‘/oif  feredo  =  1  Xthen  ’/odo; 

/*y,if  '/oeval C&oldrows)  >  */.eval(&col  +  1)  '/othen  ‘/.do; 

%let  newrows  =  ‘/.eval  (feoldrows  -  1); 

“/oend; 

’/.else  "/.do; 

'/.let  newrows  =  ‘/.eval  C&oldrows) ; 

‘/.end;*/ 

/******The  following  code  determines  whether  we  merge  the  zero***** 
*******column  with  the  right  column  or  with  the  left  column. 
*******The  macro  variable  merge  is  1  if  we  merge  right  and  -1  ***** 
♦*si=**:*;*if  we  merge  left.  ****:♦: 

data  _null_; 
set  _probs_; 

array  oldp{&oldrows ,&oldcols} ; 

oldrows  =  feoldrows; 

oldcols  =  feoldcols; 

row  =  &row; 

col  =  &col; 

if  col  =  1  or  row  =  col+1  or  (1  <  col  <  oldcols  and 

oldp (row, col-1)  >  oldpCrow, col+1))  then 
merge  =  1; 

else  merge  =  -1; 

/****Now  we  determine  the  number  of  newrows.**/ 
if  merge  =  1  then  do; 

if  oldrows  >  col  +  1  then  newrows  =  oldrows  -  1; 
else  newrows  =  oldrows; 

end; 

else  if  merge  =  -1  then  do; 

if  oldrows  >=  col  +  1  then  newrows  =  oldrows  -1; 
else  newrows  =  oldrows; 

end; 

call  symput ( ’merge ’ , compress (merge) ) ; 
call  symput ( ’ newrows ’ , compress (newrows) ) ; 
run; 

‘/.let  newcols  =  ‘/.eval(&oldcols  -  1); 

‘/.let  newtime  =  ‘/.eval(&oldtimes  -  1); 


****=i«=i==i=*=(:********Now  the  collapsing  begins .  **********=»:♦♦♦*♦!»:**♦**** 


data  _probs_; 
set  _probs_; 

array  oldp{&oldrows ,&oldcols}; 

array  newp{&newrows,&newcols};  /^array  of  newprobs*/ 
array  oldt-[&oldtimes}; 

array  newt{&newtime3-;  /♦array  to  contain  newtimes*/ 

row  =  &row;  /♦row  where  zero  is*/ 

col  =  &col;  /♦column  where  zero  is*/ 

newrows  =  fenewrows; 

newcols  =  &newcols; 

merge  =  femerge; 

if  (col=l  or  (  col=2  and  merge  =  -1)) 

then  do;  /♦merging  old  cols  1  eind  2*/ 

/♦If  col  is  one  we  can  only  combine 
♦♦col  1  and  2.  We  combine  col  1  and  2 
♦♦also  col=2  old  prob  is  smaller  in  row  !♦/ 

do  k  =  1  to  fenewtime; 

newt(k)  =  oldt(k+l);  /♦all  times  move  up  !♦/ 

end; 

newp(l,l)=  oldp(l,l)  +  oldp(l,2); 
nex7p(2,l)=  oldp(2,l)  +  oldp(2,2)  + 
oldp(3,l)  +  oldp(3,2)  ; 
if  newrows  >=  3  then  do; 

do  i  =  3  to  fenewrows; 

newp(i,l)  =  oldp(i+l,l)  +  oldp(i+l,2); 
end; 
end; 

do  j  =  2  to  fenewcols; 

newp(l,j)  =  oldp(l,j+l); 
newp(2,j)  =  oldp(2,j+l)  +  oldp(3,j+l); 
if  newrows  >=  3  then  do; 
do  i  =  3  to  &newrows; 

newp(i,j)  =  oldp(i+l, j+1) ; 
end; 


no 


end; 


end; 

else  if  (col=2  and  merge  =  1) 
then  do; 

/♦merging  old  columns  2  and  3.*/ 

newt(l)  =  oldt(l); 
do  k  =  2  to  &newtime; 
newt(k)  =  oldt(k+l); 

end; 

do  i  =  1  to  2; 
newp(i,l)  =  oldp(i,l); 
newp(i,2)  =  oldp(i,2)  +  oldp(i,3); 
if  fenewcols  >=3  then  do; 
do  j  =  3  to  fenewcols; 

newp(i,j)  =  oldp(i,j+l); 
end; 

end; 

end; 

if  &oldrows>=  4  then  do; 

newp(3,l)  =  oldp(3,l)  +  oldp(4,l); 
newp(3,2)  =  oldp(3,2)  +  oldp(3,3)  + 
oldp(4,2)  +  oldp(4,3); 
if  newcols  >=  3  then  do; 
do  j=3  to  fenewcols; 

newp(3,j)  =  oldp(3,j+l)  +  oldp(4,j+l); 
end; 

end; 

end; 

else  do; 

newp(3,l)  =  oldp(3,l); 
newp(3,2)  =  oldp(3,2)  +oldp(3,3); 
if  newcols  >=  3  then  do; 
do  j  =  3  to  fenewcols; 

newp(i,j)  =  oldp(i,j+l); 

end; 

end; 

end; 

if  newrows  >=4  then  do; 
do  i=4  to  fenewrows; 

newp(i,l)  =  oldp(i+l,l); 


Ill 


newp(i,2)  =  oldp(i+l,2)  +  oldp(i+l,3); 
if  newcols  >=  3  then  do; 
do  j=  3  to  fenewcols; 

newp(i,j)  =  oldp(i+l, j+1) ; 
end; 
end; 
end; 
end; 
end; 


else  if  (col  =  ftoldcols  or 

(  2<col<&oldcols  aind  merge  =  -1)  ) 
then  do; 

/♦combines*/ 


/♦col-1  and  col*/ 


do  k  =  1  to  col-2; 

newt (k)  =  oldt (k) ; 
end; 

do  k  =  col-1  to  fenewtime; 

newt(k)  =  oldt(k+l); 

end; 

do  i  =  1  to  min(&newrows, col-1) ; 

do  j  =  1  to  col-2; 

newp(i,j)  =  oldp(i,j); 
end; 
end; 

do  i  =  1  to  minC&newrows , col-1) ; 

newpCi , col-1)  =  oldp(i, col-1)  +  oldp(i,col); 
if  col  <  feoldcols  then  do; 

do  j  =  col  to  ftnewcols; 

newp(i,j)  =  oldp(i,j+l); 
end; 

end; 

end; 

if  feoldrows  =  col  then  do; 
do  j  =  1  to  col-2; 

newp(col,j)  =  oldp(col,j); 

end; 

newp (col, col-1)  =  oldp(col, col-1)  +  oldp(col,col) ; 
if  col  <  feoldcols  then  do; 
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do  j  =  col  to  fenewcols; 

newp(col,j)  =  oldpCcol, j+1) ; 

end; 

end; 

end; 

if  feoldrows  >=  col  +1  then  do; 
do  j  =  1  to  col-2; 

newp(col,j)  =  oldp(col,j)  +  oldp(col+l, j) ; 
end; 

newp (col, col-1)  =  oldp(col, col-1)  +  oldpCcol , col)  + 

oldp(col+l, col-1)  +  oldp(col+l,col) ; 
if  col  <  ftoldcols  then  do; 
do  j  =  col  to  fenewcols; 

newp(col,j)  =  oldpCcol, j+1)  +  oldp(col+l , j+1) ; 

end; 

end; 

if  newrows  >=  col  +  1  then  do; 
do  i  =  col  +1  to  fenewrows; 
do  j  =  1  to  col-2; 

newp(i,j)  =  oldp(i+l,j); 
end; 

end; 

do  i  =  col+1  to  ftnewrows; 

newpCi, col-1)  =  oldp(i+l , col-1)  +  oldp(i+l ,col) ; 

end; 

do  i  =  col+1  to  fenewrows; 
if  col  <  feoldcols  then  do; 
do  j  =  col  to  ftnewcols; 

newp(i,j)  =  oldp(i+l , j+1) ; 
end; 
end; 
end; 
end; 
end; 
end; 

else  if  (2<col<&oldcols  and  merge  =  1) 

then  do;  Z+combine  columns  col  and  col+l+/ 

do  k  =  1  to  col-1; 

newt(k)  =  oldt(k);  /*times  the  same  up  to  t (col-1)*/ 
end; 
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do  k  =  col  to  fenewtime;  /*lose  the  old  t(col)*/ 
newt(k)  =  oldt(k+l); 

end; 

do  i  =  1  to  minC&newrows , col) ; 

newp(i,col)  =  oldp(i,col)  +  oldp(i , col+1) ; 
do  j  =  1  to  col-1 ; 

newp(i,j)  =  oldp(i,j); 

end; 

end; 

do  i  =  1  to  min(&newrows,col) ; 
do  j  =  col+1  to  fenewcols; 

newp(i,j)  =  oldp(i,j+l); 

end; 

end; 

if  newrows  >=  col+1  then  do; 
if  newrows  =  col+1  then  do; 
do  j  =  1  to  col-1; 

/*if  col+1  is  last  new  row*/ 
newp(col+l, j)  =  oldp(col+l , j )  ;/*then  new  prob  =  old  for*/ 

/*coliuDns  up  to  col  in  row*/ 
end;  /*col  +  1.*/ 

newp (col+1, col)  =  oldp (col+1 , col)  +  oldp (col+1, col+1) ; 
do  j  =  col+1  to  ftnewcols; 

newp(col+l , j)  =  oldp(col+l, j+1)  ; 

end; 

end; 

else  do; 

do  j  =  1  to  col-1; 

newp(col+l, j)  =  oldp(col+l , j)  +  oldp(col+2, j) ; 
end; 

newp (col+1 , col)  =  oldp (col+1 , col)  +  oldp (col+1, col+1)  + 
oldp (col+2, col)  +  oldp(col+2, col+1) ; 
do  j  =  col+1  to  fenewcols; 

newp(col+l, j)  =  oldp(col+l , j+1)  +  oldp(col+2, j+1) ; 

end; 

end; 

end; 

if  newrows  >  col  +  2  then  do; 
do  i  =  col  +2  to  fenewrows; 
do  j  =  1  to  col-1; 
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newp(i,j)  =  oldp(i+l , j) ; 
end; 

end; 

do  i  =  col+2  to  fenewrows; 

newp(i,col)  =  oldp(i+l ,col)  +  oldp(i+l,col+l) ; 

end; 

do  i  =  col+2  to  fenewrows; 
do  j  =  col+1  to  fenewcols; 

newp(i,j)  =  oldp(i+l, j+1) ; 
end; 
end; 
end; 
end; 

newtot  =  fenewrows*fenewcols ; 

call  symputC’ newtot’ , compress (newtot)) ; 

run;  /♦♦Need  to  check  all  ‘/.ends  and  ends^^/ 

data  _probs_; 

set  _probs_; 

keep  newpl-newpfenewtot  newtl-newtfenewtime  ; 
run; 

data  _probs_; 

set  _probs_;  /♦♦♦putting  current  probs  and  times  in  old  array 
♦♦for  the  next  do  while  loop.^^/ 
array  newp{fenewtot3-; 
array  oldp-Cfenewtot}; 
cirray  newt{fenewtime}; 
array  oldt{fenewtime}; 
do  i  =  1  to  fenewtot; 

oldp(i)  =  newp(i) ; 

end; 

do  k  =  1  to  fenewtime; 
oldt(k)  =  newt(k); 

end; 

oldtot  =  fenewtot; 
oldrows  =  fenewrows; 
oldcols  =  fenewcols; 
oldtimes  =  fenewtime; 

call  symputC’oldtot’ , compress (oldtot)) ; 
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call  symput ( ’ oldrows ’ , compress (oldrows) ) ; 
call  symput ( ’ oldcols ’ , compress (oldcols) ) ; 
call  symput ( ’ oldtimes ’ , compress (oldtimes) ) ; 
keep  oldpl-oldp&newtot  oldtl-oldt&newtime ; 
run; 

°/oend;  /**This  ends  the  do  if  redo=l  group**/ 
°/oend;  /**This  ends  the  do  while  redo=l  group**/ 

/**attempt  to  fix  numcols  problem  from  SA**/ 

°/oif  &yes  =  0  %then  ‘/do; 
data  _null_; 
newrows  =  ftoldrows; 
newcols  =  feoldcols; 

call  symput (’ newrows ’ .compress (newrows)) ; 
call  symput ( ’newcols ’ , compress (newcols) ) ; 
stop; 
run; 

‘/.end ; 

data  fenewprobs; 
set  _probs_; 
array  oldp-C&oldtot}; 
array  newp{&oldtot} ; 
newrows  =  fenewrows; 
newcols  =  fcnewcols; 
do  i  =  1  to  feoldtot; 

newp(i)  =  oldp(i) ; 
end; 

keep  newpl-newp&oldtot  newrows  newcols; 
run; 

data  fenewtimes; 
set  _probs_; 
array  oldt{&oldtimes} ; 
array  t{&:oldtimes}; 
newrows  =  fenewrows; 
newcols  =  fenewcols; 
do  k  =  1  to  feoldtimes; 

t(k)  =  oldt(k); 
end; 
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keep  tl-t&oldtimes  newrows  newcols; 
run; 

7omend ; 

/*  nextzero (probs .times .numrows .numcols ,numtimes , 

probpref .newprobs .newtime s) ; */ 
/*%nextzero (f nlprobs .times . fenumrows . fcnumcols . fenumt imes . 

newp.newprobs.newtimes) ;  */ 


data  mm. exposed  mm. control; 
set  mm. cancer; 

if  id  =  ’E'  then  output  mm. exposed; 
else  if  id  =  ’C’  then  output  mm. control; 
run; 

‘/.macro  submtrx(inf ile.Bi .Bj  .Ei.Ej  .outfile)  ; 

*♦***:(!  ^J(t3(::t:!(t3(t*!t!*=t!**********  ♦***♦*=*=*  ****** 

♦♦♦This  macro  produces  the  covariance  matrix  for^^^ 
♦♦♦the  estimates  of  the  subvector  of  joint  ♦♦♦ 

♦♦♦probability  estimates  for  a  matrix  of  ♦♦♦ 

♦♦♦rectangles.  It  also  pulls  out  the  ♦♦♦ 

♦♦♦corresponding  probabilities  and  their  ♦♦♦ 

♦♦♦their  original  row  and  column  idices.  ♦♦♦ 

♦♦♦Infile  contains  in  variables  named  covl-covK  ♦♦♦ 
♦♦♦the  covariance  matrix  for  the  vector  of  joint^^^ 
♦♦♦probability  estimates  whose  values  eire  in  ♦♦♦ 

♦♦♦a  variable  called  p.  These  are  the  first  K  ♦♦♦ 
♦♦♦probabilities  of  K+1  probabilities.  ♦♦♦ 

♦♦♦infile  must  also  contain  variables  Row  and  Column  ♦♦♦ 
♦♦♦which  are  the  rectcuigle  coordinates  for  the  ♦♦♦ 
♦♦♦estimates  in  p.  ♦♦♦ 

♦♦♦Bi  and  Bj  are  the  row  and  column  coordinates  ♦♦♦ 
♦♦♦for  the  first  rectangles  whose  probability  is^^^ 

♦♦♦to  be  included.  Ei  and  Ej  are  the  row  and  ♦♦♦ 
♦♦♦column  coordinates  for  the  last  probability  ♦♦♦ 

♦♦♦to  be  included.  The  set  of  probabilities  ♦♦♦ 
♦♦♦defined  by  the  rectacle  of  coordinates  from  ♦♦♦ 


♦♦♦(Bi.Bj)  to  (Ei.Ej)  will  be  pulled  from  the  ♦♦♦ 
♦♦♦p  variable  in  infile.  The  corresponding  ♦♦♦ 
♦♦♦submatrix  will  be  pulled  from  the  columns  ♦♦♦ 
♦♦♦covl-covK.  If  the  set  of  coordinates  ♦♦♦ 
♦♦♦the  (K+1) St  prbability  then  the  respective  ♦♦♦ 
♦♦♦variances  and  covariances  will  be  computed.  ♦♦♦ 
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***output  will  be  a  sas  data  set  containing  *** 
***the  submatrix  in  the  variables  names  subcovl  *** 
***through  subcovS,  where  S  is  the  number  of  *** 
=i=**probablities  in  the  subvector.  A  variable  *** 
***named  p  will  contain  the  subvector  of  *** 
***probabilities .  Row  and  Column  will  be  *** 
★♦♦variables  containing  the  coordinates  of  the  ♦♦♦ 
♦♦♦probabilities  in  the  original  matrix  of  *** 
♦♦♦joint  probabilities.  ♦♦♦ 


Data  _input_; 

set  feinfile  nobs  =  n  end=done; 
if  _n_  =  1  then  do; 

call  symput  ('origdim\ compress (n) )  ;  /♦origdim  is  the  dimension  of  the 

original  covariance  matrix.  There 
are  thus  origdim+1  joint  probabilities . ♦/ 

end; 

if  (  &Bi<=  Row  <=  &Ei)  and  (&Bj  <=  Column  <=  &Ej)  then  do; 

_keep_  =  _n_; 

end; 

else  _keep_  =  0; 
rxm; 

data  _keeprs_; 
set  _input_; 
if  _keep_  ne  0; 
keep  _keep_; 
riin; 

data  _null_; 

set  _keeprs_  nobs  =  n; 

call  symput (' count compress (n)) ; 

stop; 

run; 

data  _keeprs_ ; 
set  _keeprs_  end  =  last; 
array  subp{&coimt} ; 
retain  subpl-subp&count ; 
subp(_n_)  =  _keep_; 
if  last; 

keep  subpl-subp&count; 


run; 


data  _null_; 
set  _keeprs_; 
array  subp{&count} ; 

°/,do  i  =  1  ‘/.to  fecount; 

call  symputC'keep&i" .compress (subpC&i))) ; 

“/oend; 

stop; 

run; 

Data  feoutfile; 
set  _input_; 
if  _keep_  ne  0; 
keep  row  column  p; 
y«do  i  =  1  7, to  fecount; 

keep  covMkeep&i ; 
y.end; 
run; 
y.mend ; 

/*submtrx(infile,Bi,Bj ,Ei,Ej .outfile) ;*/ 

/♦options  notes  symbolgen; 
y,submtrx(rcov,2,2,3,3,out) ;  */ 

‘/macro  svunstd(covprc, prefix, outfile)  ; 

♦♦♦This  macro  computes  the  sum  and  standard  deviation*** 
***of  the  s\im  of  probability  estimates.  *** 

♦♦♦covprc  is  a  SAS  data  set  with  columns  prefixl  *** 
♦♦♦through  prefixK  containing  the  covarieoice  matrix  ♦♦♦ 
***of  the  probabilities  in  the  variable  p,  where  K  ♦♦♦ 
♦♦♦is  the  number  of  observations  (and  probabilities) . ♦♦♦ 
=it**covprc  also  contains  variables  row  and  column  ♦♦♦ 
♦♦♦Outfile  will  be  a  SAS  data  set  two  variables  whose*** 
♦♦♦names  are  sum  and  stdev  and  one  observation.  ♦♦♦ 

♦♦♦Sum  and  stdev  will  ♦♦♦ 

♦♦♦be  the  names  of  global  macro  variables  containing  ♦♦♦ 
♦♦♦the  sum  of  the  probabilities  and  the  standard  ♦♦♦ 
♦♦♦deviation  of  the  sum.  ♦♦♦ 

’/.global  sum  stdev; 

proc  contents  data  =  fecovprc  out=_conts_  noprint; 
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run; 

%let  preflen  =  °/olength(&pref  ix) ; 
data  _conts_; 
set  _conts_; 
retain  _pref_; 

_pref_  =  upcase(substr(name,l,&preflen)) ; 
if  _pref_  =  upcaseC'&pref ix") ; 
keep  name; 
riin; 

data  _null_; 

set  _conts_  nobs=n; 

call  symputC’n’ , compress (n)) ; 

stop; 

run; 

data  _conts_; 

set  _conts_  end=last; 

cirray  cov{&n}  $; 

retain  covl-cov&n; 

cov(_n_)  =  name; 

if  last ; 

run; 

data  _null_; 
set  _conts_; 
array  cov{&n}  $; 
y.do  i  =  1  y,to  &n; 

call  symput ("cov&i" , compress (cov(&i)))  ; 
y.end; 
stop; 

run;  Z+names  of  the  covariance  columns  are  in  macros  covl-covn*/ 

data  feoutfile; 

set  fecovprc  end=last; 

rowtot  =  0; 

“/odo  i  =  1  y»to  &n; 

rowtot=  rowtot  +  &&cov&i ; 

“/.end ; 

variance  +  rowtot; 
sum  +  p; 
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if  last  then  do; 

stdev  =  sqrt (variance) ; 

call  symput  ( '  sum ' ,  compress  (siim)  )  ; 

call  symput ( ’ stdev ’ , compress (stdev) ) ; 

end; 

if  last; 

keep  sum  stdev; 

run; 

/* 

’/oput  The  sum  of  the  probalities  is  &sum; 

7,put  The  standard  deviation  is  festdev;  */ 

%mend ; 

/*sumstd(covprc,pref ix.outf ile)*/ 

data  test; 

input  danl-dan5  p; 

cards ; 

12345  .05 
23451  .1 
3  4  5  1  2  .2 
45123  .25 
5  1  2  3  4  .3 

run; 

y.sumstd (test , dan, out)  ; 

%macro  t ableit (inf ile , classl , class2 , f mt 1 , f mt2 , collen , cellf mt , var 
labell,label2, title) ; 

***collen  is  the  legth  of  the  column  headings. 

***cellfmt  is  the  format  for  the  cell  values. 

***var  is  the  list  variables  to  be  included  in  the  cells. 

***eg  if  prob  and  std  are  to  be  in  the  cells  then  var=prob  cell. 
♦^^Labell  and  label  two  are  the  names  of  the  two  class  variables 
=t:**For  example  labell  might  be  time-to-onset  and  label2  might 
***be  time-to-death. 

***Title  is  the  title  of  the  table.*/ 
options  ls=78  missing=’0’  nodate; 

7olet  rts  =  '/oevaK&collen  +  fecollen  +  4); 


proc  tabulate  data  =  feinfile  format =&cellfmt  ; 
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format  feclassl  &fmtl..  &class2  &fmt2..; 
class  feclassl  &class2; 
var  &var; 

table  &classl*(&var) ,  sum*&class2/rts=  &rts  ; 
keylabel  sum  =  ’  ’ ; 

label  &classl="&labell"  &class2="&label2" ; 

title  "fetitle"; 

rim; 

options  missing  =  ’ . ’ ; 

'/.mend ; 


/*tableit (infile , class 1 ,  class2 ,  fmt 1 , f mt2 , collen , cellf mt , var , 
labell , label2, title) ; */ 

/**Exmaple — see  tabling. sas  for  max  and  test2 

‘/otableit  (test2 ,  ni ,  nj  ,  interv ,  interv ,  &max ,  &max  .A,  prob  std ,  t  ime-to-onset , 
time-to-death, Joint  probability  table);  */ 

/**newtime5 . sas  3/30/98*/ 

“/omacro  ddformat (infile, numt,numdis,numdeth,pref) ; 

***This  macro  takes  an  infile  that  contains  ordered  times  tl  *** 
***through  t&numt  and  creates  formats  for  disease  and  death  *** 
***times  called  fepref.dis  and  &pref.deth,  resp.  Disease  has  *** 
***numdis  rows  of  which  the  first  row  is  disease  missing  and  ♦** 

***the  other  intervals  are  defined  by  the  ts.  *** 

*=t:*Death  has  no  missing  values  and  numdeth  intervals  defined  *** 

***by  the  ts. 

/**The  following  data  step  puts  the  times  from  infile  to  be  used*** 
***as  endpoints  into  macro  variables  ti.*/ 
data  null; 
set  feinfile; 

'/odo  i  =  1  ’/oto  &numt; 

call  symputC't&i"  ,trim(left(t&i))) ; 

“/.end; 

stop; 

run; 


/**&pref .deth  is  the  format  for  the  death  times. 
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No  missing  values  allowed.**/ 

proc  format  library  =  library; 

value  fepref.deth  0  -<  &tl  = 

7odo  i  =  2  7oto  y.eval  (fenumdeth  -  1)  ; 

‘/olet  ii  =  y.evalC&i  -  1); 

&&t&ii  -<  &&t&i  =  " [&&t&ii,&&t&i) " 

'/end; 


'/let  jj  =  '/evalC&numdeth  -  1); 

&&t&jj  -  high  =  "[&&t&jj,+)" 


run; 


/**&pref .dis  is  the  format  for  the  disease  times  plus  a  missing  category.  **/ 

proc  format  library  =  library; 

value  ftpref.dis  .  ="death  w/o  disease" 

0  -  <&tl  =  "[O.&tD" 

‘/do  i  =  2  '/to  ‘/eval(&numdis  -  2); 

'/let  ii  =  */eval(&i  -  1); 

&fet&ii  -  <&&t&i  =  " [&&t&ii ,&&t&i) " 

'/end ; 

‘/let  jj  =  ‘/eval(&numdis  -2); 

&&t&jj  -  high  =  " [&&t&j j ,+) " ; 

run; 

/****The  following  code  assigns  the  intervals  defining  the  rectangles*** 
*****to  row  and  column  numbers.  These  formats  can  be  used  later  when*** 
*****tabling  fiinctions  of  the  joint  probability  distribution  with  *** 
*****only  knowledge  of  the  row  and  column  numbers.  **/ 

proc  format  library  =  library; 

value  &pref. column  1  =  "[0,&tl)" 

‘/do  i  =  2  ‘/to  ‘/eval  (fenumdeth  -  1)  ; 

'/let  ii  =  '/eval(&i  -  1); 

&i  =  "  [&&t&ii,&&t&i) " 

'/end ; 

'/let  jj  =  '/eval C&numdeth  -  1); 

fenvimdeth  =  "[&&t&jj,+)"  ; 

run; 


/**&pref .dis  is  the  format  for  the  disease  times  plus  a  missing  category.  **/ 


1  ="death  w/o  disease" 

2  =  "[0,&tl)" 


proc  format  library  =  library; 
value  fepref . row 
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’/odo  i  =  2  “/oto  y.evaK&numdis  -  2); 

%let  ii  =  y,eval(&i  -  1); 

'/olet  iii  =  y.evaK&i  +  1); 

&iii=  " ” 

'/.end ; 

y.let  jj  =  y.eval  (feniimdis  -  2); 

feniimdis  =  "  j  ; 

run; 

’/emend ; 

’/.macro  eminvals(infile,outfile,timesout,intobs,libdrive) ; 

***This  macro  takes  an  file  named  in  infile  that  *** 

★♦^contains  variables  obst,  obsd,  idr,  idL,  itr,  itL.  *** 

♦♦★obst  and  obsd  are  the  observed  times  for  time  to  *** 

♦^♦disease  and  time  to  death,  respectively.  *** 

***Idr  and  idL  aire  indicators  that  indicate  whether  *** 

♦♦♦time  to  death  is  censored  on  the  right  and/or  left,  ♦♦♦ 
♦♦♦respectively.  An  indicator  equal  to  1  indicates  ♦♦♦ 
♦♦♦censored.  Itr  and  itL  are  the  censoring  indicators  ♦♦♦ 
♦♦♦for  time  to  disease.  The  outfile  will  contain  ♦♦♦ 

♦♦♦proc  freq  output  for  the  cells  with  rows  for  disease^^^ 
♦♦♦and  columns  for  the  death  variable.  ♦♦♦ 

♦♦♦Intobs  is  the  niimber  of  tincensored  observations  ♦♦♦ 
♦♦♦desired  between  the  points  on  the  disease  and  death  ♦♦♦ 
♦♦♦interval.  Timesout  is  a  file  containing  the  times  ♦♦♦ 

♦♦♦used  to  create  the  proc  freq  table  and  format.  ♦♦♦ 

♦♦♦Disease  times  and  deathtimes  are  chosen  separately  ♦♦♦ 

♦♦♦with  intomb  imcensored  times  between  them.  Then  ♦♦♦ 

♦♦♦they  are  put  into  one  set  of  sorted  points.  The  ♦♦♦ 

♦♦♦resulting  contingency  table  will  be  upper  right  ♦♦♦ 
♦♦♦tricingulax .  ♦♦♦ 

♦♦♦NOTE:  If  this  macro  is  used  in  a  simulation  then  ♦♦♦ 
♦♦♦remove  the  comments  around  no  print  in  the  proc  freq^^^ 
♦♦♦statement  at  the  end  of  the  macro.  ♦♦♦ 

libname  library  "&libdrive"; 

/♦♦♦♦The  next  data  step  keeps  only  the  uncensored  pairs  and  ♦♦♦ 
♦♦♦♦♦those  pairs  with  uncensored  death  time  aind  disease  time^^^ 
♦♦♦♦♦known  to  have  occurred  after  death.  ♦♦/ 

data  actual; 
set  feinfile; 
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siimd  =  idr  +  idl; 

sum  =idr  +  idl  +  itr  +itl; 

if  sum=0  or  /^disease  and  death  times  known*/ 

(obst  =  .  and  sumd  =  0) ;  /*death  known  to  have  occurred 

before  disease  and  death  time  known*/ 

keep  obst  obsd; 
run; 

/***This  data  step  keeps  only  the  imcensored  disease  times.***/ 

data  distimes; 

set  actual; 

if  obst  ne  . ; 

keep  obst; 

run; 

/**Uncensored  disease  times  are  sorted  next.**/ 

proc  sort  data=distimes ; 

by  obst ; 

run; 

/***The  user  indicates  he/she  want  intobs  in  each  interval.  *** 

*:t!**Xhe  following  data  step  pulls  the  intobsth.  time  from  the*** 
****sorted  list.  **/ 

data  distimes; 

set  distimes  ; 

if  not(mod(_n_,&intobs))  ; 

time  =  obst; 

keep  time  ; 

run; 

/***Tlie  following  data  step  sort  and  data  step  repeat  the  process  *** 

♦***just  completed  for  disease  times.***/ 

data  dethtime; 

set  actual; 

keep  obsd; 

run; 

proc  sort  data=dethtime ; 

by  obsd; 

run; 
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data  dethtime; 
set  dethtime  ; 

if  not(mod(_n_,&intobs))  ;  /*We  also  choose  the  largest  disease 

cind  death  times — not .  */ 

time  =  obsd; 
keep  time  ; 
run; 

/***The  next  data  step  combines  the  disease  and  deathtime  endpoints.*** 
****This  is  similar  to  what  Louis  et.  al.  number  III  suggest  for  *** 
****choosing  intervals.  **/ 

data  combined; 

set  dethtime  distimes; 

run; 

proc  sort  data  =  combined; 

by  time; 

run; 

/****repeated  values  of  combined  disease  and  death  uncensored  times*** 
*****  are  removed  next .  **/ 

data  combined; 
set  combined; 
if  _n_  =  1  then  b  =  time; 
else  do; 

if  time  =  b  then  delete; 
else  b  =  time; 
end; 

retain  b; 
drop  b  ; 
rim; 

/***Number  of  distinct  times  is  put  into  macro  variable  numtimes.***/ 
data  null; 

set  combined  end  =  last  nobs  =  n; 
call  symput ( 'numtimes ’ , left (trim (n))) ; 
stop ; 
run; 
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/***Output  data  set  of  times  is  created  next,  timesout  has  one 
>it!t!=t:*observation  which  is  an  array  of  length  numtimes  having 
♦**:<cprefix  t. 

data  fetimesout; 
set  combined  end  =  last; 
cirray  t{&numtimes}; 
retain  tl-t&numtimes; 
t(_n_)  =  time; 
if  last; 

keep  tl-t&numtimes; 
riin; 


***We  now  use  the  points  ti  in  the  outfile  from  the  **♦ 

***macro  eminvals  to  format  T  and  D,  where  T  is  time  to  *** 
***disease  and  D  is  time  to  death.  We  use  this  format  *** 
***in  a  PROC  FREQ  to  begin  reducing  the  intervals  to  a 
♦♦♦lattice  that  has  a  reasonable  number  of  imcensored  ♦♦♦ 
♦♦♦points  in  each  rectangle.  ♦♦♦ 


♦♦♦Clasfile  will  be  an  outfile.  It  is  the  original  datafile^^^ 


♦♦♦with  variables  row  and  column  indicating  the  row  and  ♦♦♦ 
♦♦♦column  membership  for  obst  and  obsd  The  row  is  the  ♦♦♦ 
♦♦♦disease  category  and  column  is  the  death  category.  ♦♦♦ 
♦♦♦Disease  row=  1  means  missing.  Rowl,  rowr,  columnl  and  ♦♦♦ 
♦♦♦columnr  are  the  similar  variables  for  obstl,  obstr,  ♦♦♦ 
♦♦♦obsdl  and  obsdr.  ♦♦♦ 


y.let  numdis  =  ’/.evaK&numtimes  +2); 

’/olet  numdeth  =  ‘/.evaK&numtimes  +  1); 

%ddf ormat (fetimesout ,&n\imtimes ,&numdis ,&numdeth,&pref) ; 

/♦♦put  comments  around  noprint  for  simulation  Txms.**/ 

proc  freq  data  =  actual  /♦noprint^/; 
title  ’Uncensored  disease  by  death  counts’; 
format  obst  fepref.dis.  obsd  fepref.deth.; 
table  obst^obsd/missing  sparse  out=  feoutfile; 
run; 

'/emend ; 


*♦♦ 

**/ 
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/*  eminvalsCinfile, outfile, times, intobs)  */ 

/^'/.eminvals (mortmorb, times ,20)  ; */ 

'/.macro  totrlrsk(ntimcovp,dencovp, outfile, title) ; 

y :4c :):  *  ^  *  ^  ^  *  *  *  *  ^  4: 3):  3):  4: 4=  *  *  *  *  *  *  *  *  4:  *  *  *  *  *  *  4c  =!< 

*44:This  macro  takes  two  matrices  of  estimated  joint  *** 
*4*probabilities  and  produces  the  relative  risk  of  *** 
***dying  with  disease  for  numcovp  vs  dencovp  groups.  *** 
4:44cnumcovp  is  a  SAS  data  set  for  the  numerator  of  the*** 
♦♦^relative  risk,  dencovp  is  a  SAS  data  set  for  the  *** 
***denominator  of  the  relative  risks.  Both  SAS  data  *** 
***sets  must  have  the  same  number  or  rows  and  columns*** 
***corresponding  to  the  same  set  of  rectangles  in  a  *** 
***two  dimensional  matrix.  The  columns  for  the  data  *** 
***sets  must  be  variables  named  covl-covK,p,  row  and  *** 
*4:*coliimn.  K  is  the  number  of  joint  non-zero  joint  *** 
***probabilities  for  the  rectangles  cind  also  the  *** 
♦♦♦number  of  rows.  Covl-covK  are  the  columns  of  the  *♦* 
♦♦♦covariance  matrix  for  the  estimates  of  the  K  joint^^^ 


♦♦♦probabilities  contained  in  variable  p.  Row  and  *** 
♦♦♦coliimn  are  the  row  and  column  indices,  indicating  *** 
♦**the  coordinates  of  the  rectangle  for  p  is  the  *** 
♦♦♦estimate  of  the  joint  probability.  The  row  and  *** 
♦♦♦column  values  for  both  data  sets  must  be  the  same  *** 
♦♦♦and  in  the  same  order.  *** 

♦**A  SAS  data  set  given  the  name  in  the  argument  *** 

♦♦♦outfile  will  contain  one  observation  with  *** 

♦♦♦variables  relrisk,  stdev  emd  column.  ♦♦♦ 

♦♦♦Title  will  be  the  title  of  the  printed  output .  ♦♦♦ 

*♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦  ♦♦♦♦♦♦♦♦♦/ 
/♦♦WARNING!!!  WARNING!!!  WARNING!!!  WARNING!!!*** 

♦♦♦  Be  sure  that  the  follwoing  include  statements  *** 
***have  the  correct  addresses.  *** 

***WARNING ! ! !  WARNING!!!  WARNING!!!  WARNING!!!**/ 


’/.include  ’  c :  \windows\desktop\mortmorb\submtrx .  sas  ’  ; 

'/.include  ’  c :  \windows\desktop\mortmorb\sumstd .  sas  ’  ; 

’/.include  ’  c :  \windows\desktop\mortmorb\ratiostd .  sas  ’ ; 

/:4t^^The  following  code  gets  the  dimension  of  the  pvector  for  the  two*** 
**^*groups  and  tests  to  see  if  they  are  equal.  *** 

:(.*****  :4c  ***♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦**************************  *******************/ 

data  _null_; 

set  &numcovp  nobs=n  end  =  last; 
call  symput ( ’ rdim ’ , compress (n) ) ; 
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if  last  then  do; 

call  symputC’colnum’ .compress (column) ) ;  /**nximber  of  columns  put  into 

macro  variable  colnum**/ 

call  symputC’rownvim’ .compress (row)) ;  /**number  rows  in  rownum**/ 
end; 
run; 

data  _null_; 
set  &dencovp  nobs=n; 
if  n  =  ferdim  then  do; 
end  =  0; 

call  symput ( ’ end ’ . compress (end) ) ; 
stop; 

end; 

else  do; 

end  =  1; 

call  symput ( ’ end ’ . compress (end) ) ; 
stop; 

end; 

run; 

y,if  &end  =  1  */,then  7,do; 

y,put  The  dimensions  of  the  two  probability  vectors  are  miequal.; 

‘/.put  We  cannot  compute  the  requested  table . ; 

‘/oend; 

’/.else  '/.do; 

data  covnum;  is  the  prefix  for  the  numerator.**/ 

set  &numcovp; 
array  cov{&rdim}; 
array  rcov{&rdim}; 

'/.do  i  =  1  ’/.to  ferdim; 

rcov(&i)  =  cov(&i) ; 

’/.end; 


rrow  =  row; 
rcolumn  =  column; 

run; 

data  covden;  /***c  is  the  prefix  for  the  denominator.**/ 

set  fedencovp; 
array  cov{&rdim}; 
array  ccov{&rdim}; 

’/.do  i  =  1  ’/.to  &rdim; 
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ccov(&i)  =  cov(&i) ; 

y.end ; 


crow  =  row; 
ccolumn  =  column; 

run; 

data  bothcovp; 
merge  covnum  covden; 
r\m; 

/^♦♦testing  to  be  sure  the  row  and  column  indices  match.***/ 

data  _null_; 
set  bothcovp  ; 

if  rrow  ne  crow  or  rcolumn  ne  ccolumn  then  do; 
end  =  1; 

call  symput ( ’ end ’ , compress (end) ) ; 
stop; 

end; 

run; 

y,if  &end  =  1  “/.then  ‘/odo ; 

’/put  The  row  and/or  column  indices  for  the  two  vectors; 
’/put  Do  not  match.  ; 

’/put  We  must  stop  here .  ; 

’/end ; 

y,else  ’/do; 

data  covnum; 
set  covnum; 
row=rrow ; 
column  =  rcolumn; 

drop  rrow  rcolumn  rcovl-rcov&rdim  ; 
rim; 

data  covden; 
set  covden; 
row=crow; 
column  =  ccolumn; 

drop  crow  ccolumn  ccovl-ccov&rdim; 
run; 
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'/oSubmtrx (covnum ,2,1, ferownum, fecolnum ,numj cov)  ; 

‘/osubmtrx  (covden  ,2,1,  &rownmn ,  fecolnum ,  den j  cov)  ; 

’/«sumstd(numjcov,  cov,numout) ; 
y«snmstd(denjcov,cov,denout) ; 

/♦*The  files  numout  and  denout  contain  one  observation  each*** 

***with  variables  sum  and  stdev,  which  are  the  sum  of  the  *** 
***probabilities  in  the  column  and  the  standard  deviation  *** 

***of  the  sum,  respectively.  The  next  code  puts  these  *** 
***quantities  into  macro  variables  with  the  appropriate  *** 
***names .  *** 

data  _null_; 

set  numout; 

sig  =  stdev*stdev; 

call  symput  ( ’niomsig’ ,  compress  (sig) ) ; 
call  symput(’numsum’ , compress (sum)) ; 
stop; 
run; 

data  _null_; 

set  denout; 

sig  =  stdev*stdev; 

call  S3anput  ( ’  densig  ’ ,  compress  (sig) ) ; 
call  symput ( ’ densum ’ , compress (sum) ) ; 
stop; 
rim; 

/**!(! The  next  call  to  ratiostd  computes  the  relative  risk  for  row  j*** 
****along  with  its  standard  deviation.  *** 

*/,rat iostd  (fenumsum ,  fedensum ,  &numsig ,  fedensig ,  0 ,  ratout )  ; 

/***Now  the  relative  risk  eind  standard  deviation  will  be  put  into*** 
****the  output  data  set.  *** 

data  feoutfile; 

label  relrisk  =  ’Relative  Risk’  stdev  =  ’Standard  Deviation’; 
set  ratout; 
relrisk  =  ratio; 


keep  relrisk  stdev; 
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run; 

proc  print  data  =  feoutfile  noobs  label; 
title  "fetitle"; 
var  relrisk  stdev; 
run; 

"/oend; 

7oend; 

'/.mend ; 

/^example*/ 

/*t otrlrskCnumcovp , dencovp , outf ile .title)  */ 

libname  library  ’c:\windows\desktop\mortmorb’;  /*where  formats  are  stored 
/^options  nosymbolgen  nonotes;  */ 

/*°/.totrlrsk(rf illd, cfilld, final ,  Total  Relative  Risk  of  Dying  with  disease);*/ 
libname  library  ’ c : \windows\desktop\mortmorb ’ ; 
proc  format  library  =  library; 

value  dis  .  =  Meath  wo  disease’ 

0-  15  =  ’ [0,15)’ 

15-  20  =  ’  [15,20)’ 

20-  25  =  ’  [20,25) ’ 

25  -30  =  ’ [25,30)’ 

30-  35  =  ’ [30,35)’ 

35-  high  =  ’[35,+)’; 

run; 

proc  format  library  =  library; 
value  deth  0-  15  =  ’[0,15)’ 

15-  20  =  ’ [15,20)’ 

20-  25  =  ’  [20,25)’ 

25  -30  =  ’  [25,30) ’ 

30-  35  =  ’ [30,35) ’ 

35-  high  =  ’ [35,+) ’ ; 

riin; 

data  times; 
input  tl-t5; 
cards ; 

15  20  25  30  35 


run; 


